Add string attribute using hdf5-rust

Been banging me head against this for far too long, and suspect somebody out there can answer this in a few seconds ...

How can you write and read string-valued HDF5 attributes using the hdf5-rust crate?

In other words, what is the hdf5-rust equivalent of this h5py sample

import h5py
f = h5py.File('a_file', 'w')
g = f.create_group('a_group')
g.attrs['attr-name'] = 'string value'
...
f = h5py.File('a_file', 'r')
assert f['a_group'].attrs['attr-name'] == 'string value'

?

Not sure (haven't use this library yet), have you been through the Location methods which also apply to groups? I'm thinking of .new_attr() and going from there? It doesn't look like there's a one liner, but it seems kind of possible to set a string attribute.

cc @aldanor if you know and have time?

Sorry, I should have been more specific. I can write (and read) vector (and even multi-dimensional) attributes, like this:

group
    .new_attr_builder()
    .with_data(&[9.8, 7.6])
    .create("attr_name")?;

(or even using new_attr instead of new_attr_builder)

What I fail to do is to write scalar attributes or string attributes. Actually, I can write string attributes, but they end up being stored as arrays of bytes, rather than strings.

Here is an example for adding string attributes:

pub fn create_str_attr<T>(location: &T, name: &str, value: &str) -> Result<()>
where
    T: Deref<Target = Location>,
{
    let attr = location.new_attr::<hdf5::types::VarLenUnicode>().create(name)?;
    let value_: hdf5::types::VarLenUnicode = value.parse().unwrap();
    attr.write_scalar(&value_)
}
2 Likes

Thanks, that works, though there are some glitches

  1. Fully qualify some names. This is trivial. I've added these to the version included below.

  2. T: Deref<Target = hdf5::Location> works if you pass in a group, but fails if you pass in a dataset. Changing it to T: Deref<Target = hdf5::Container> makes it work with datasets but breaks it for groups. I haven't found a one-fits-all solution.

    The sample below shows the body of create_str_attr inlined into the client code, as well as a call to the function. The inlined code works for both datasets and groups, while the function call works for only one of them, depending on whether Target is Location or Container.

fn create_str_attr<T>(location: &T, name: &str, value: &str) -> hdf5::Result<()>
where
    T: std::ops::Deref<Target = hdf5::Container>,
{
    let attr = location.new_attr::<hdf5::types::VarLenUnicode>().create(name)?;
    let value: hdf5::types::VarLenUnicode = value.parse().unwrap();
    attr.write_scalar(&value)
}

fn main() -> hdf5::Result<()> {

    // suppress spamming stdout
    let _suppress_errors = hdf5::silence_errors(true);

    let file_name = "/tmp/attribute_test.h5";
    let group_name = "the_group";
    let dataset_name = "the_dataset";

    let file = hdf5::File::create(file_name)?;
    let group = file.create_group(group_name)?;
    let dataset = group
        .new_dataset_builder()
        .with_data(&[1.2, 3.4])
        .create(dataset_name)?;

    let attr = dataset.new_attr::<hdf5::types::VarLenUnicode>().create("unicode_attribute")?;
    let value: hdf5::types::VarLenUnicode = "‽🚐".parse().unwrap();
    attr.write_scalar(&value)?;

    create_str_attr(&dataset, "another_unicode_attribute", "‽🚐")?;
    Ok(())
}

Some randomish observations:

  1. Location::new_attr and Container::new_attr seem to be related in an ad-hoc way: there is no trait uniting them. (The same is true for new_attr_builder.) This seems to make it impossible to write a single polymorphic create_str_attr that works with both datasets and groups.

  2. Searching for new_attr in the search box of the crate's documentation, finds Location::new_attr but not Container::new_attr.

  3. The solution upthread uses Attribute::write_scalar. I can't find an equivalent in AttributeBuilder.

  4. AttributeBuilder does have a with_data_as (in addition to with_data), which takes a TypeDescriptor, which has a VarLenUnicode, which I hoped might allow me to translate the upthread solution to the AttributeBuilder style, but I haven't managed.

Container implements Deref for Location. So if you have a Dataset, you can deref twice to get Location. For example, create_str_attr(&*dataset, "encoding-type", "array")?;

Ah, well spotted. It's a bit ugly, but it works. Thanks!

Apologies for being a bit late to the thread (been traveling for a while).

Here's a few notes:

  1. You don't have to mess with Deref trait at all. It's also unnecessary to perform &* either. Instead you can just rely on Rust's auto-deref. In the example above, it's sufficient to just have
fn create_str_attr(location: &Location, ...) {}

and it will work with both Location but also with anything that derefs to Location (either directly or indirectly). Here's a brief playground example to demonstrate how it works.

  1. Simpler ways of writing scalar attributes are actually on the top of my todo list already (i.e., something like .write_attr_scalar(k, v) and the equivalents for more generic use cases). It's a very common use case and should not involve builders. I'll try to add them soon. There's also a few things missing in attributes and their builders that are present in datasets that should be added for consistency sake.

  2. With strings, we can probably add special methods for scalar attributes and datasets to simplify the process (e.g. .write_attr_string::<VarLenUnicode>(k, "foo") or even .write_attr_string_vu("foo"), need to ponder on it a bit).

Please let me know if there's any other unresolved questions still.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.