New crate: include_data. Review for soundness and API

https://crates.io/crates/include_data

A little while ago I was including some static data in an executable and wanted to do this in a type-safe way. I came across various discussions, and this exellent blog post and decided a crate was probably in order. I also briefly spoke to the author (Jack Wrenn) about an earlier iteration of this crate before publishing.

I'm pretty sure this simple crate is sound and in good shape, but I did want to open it up for review to make sure. Please feel free to comment here or open up issues on the repo if you spot anything or have any API comments.

Ideally I'd like this to hit 1.0 very soon.

From your README:

Usage with custom types

You can include data in any custom type you like. The best way of doing this is if your custom type satisfies the requirements for bytemuck::Pod, in which case you can simply use include_data.

#[repr(C)]
#[derive(Copy, Clone)]
struct Foo {
   integer: u16,
   pair: (u8, u8),
}

// Safety: the requirements for `Pod` have been manually checked.
unsafe impl bytemuck::Zeroable for Foo {}
unsafe impl bytemuck::Pod for Foo {}
static FOO_DATA: Foo = include_data!("../tests/test_data/file_exactly_4_bytes_long");

This implementation is unsound, as the representation for tuples is the default representation, meaning that is it largely left unspecified and there are no guarantees of no padding or even that the fields are stored in order of declaration. This could be fixed by replacing (u8,u8) with [u8; 2].

This also does not specify what should happen if the specified file does not happen to contain exactly four bytes. Is it UB, silently ignored or a compile-time error? This should be documented extensively as these are the most important questions I would ask before using this library.
Edit: I just saw it is documented on the macros themselves. Still this also belongs in the main README / crate docs page.


#[macro_export]
macro_rules! include_u8 {
    ($file:expr $(,)?) => {
        $crate::include_slice!(u16, $file)
    };
}

This looks wrong. Are you sure it shouldn't be include_slice!(u8, $file)? In addition, I really don't see that need for these extra methods. Are they really necessary? In my opinion they are redundant in favor of the, also more readable, include_slice. I also find the name confusing and would expect a macro like include_u8 to give me a single value of type u8, not a slice.

5 Likes

Thanks for the review!

This implementation is unsound, as the representation for tuples is the default representation,

Good catch, I'd always assumed tuples were laid out like struct fields with #[repr(C)], but apparently I never actually checked!

Still this also belongs in the main README / crate docs page.

Fair

Are you sure it shouldn't be include_slice!(u8, $file)?

Good catch

Are they really necessary?

Not at all, I may get rid of them entirely, but for the moment they're essentially zero-maintenance.

I also find the name confusing and would expect a macro like include_u8 to give me a single value of type u8, not a slice.

Good point, I'm pluralising all of them.


Changes to address all of your points will be pushed shortly.

1 Like

You should probably say something about byte order in the documentation— If I'm compiling on a little-endian machine for a big-endian target (or vice versa), which endianness should be stored inside the data file?

3 Likes

Thanks, the redrafted README and docs say this much more explicitly. Basically the data is copied in byte-for-byte, the interpretation of the bitpattern happens entirely at runtime. The only thing "happening" at compile time is a copy into a properly aligned memory location. Thus, interpretation of multi-byte sequences is according to the endianness (and everything else) of the target machine.

Minor nitpit:

The README specifies:

This library will work out-of-the-box with any type that implementsbytemuck::Pod.

This link leads to the derive macro page for Pod (Pod in bytemuck - Rust), while it should likely lead to the trait docs: Pod in bytemuck - Rust.

the markup auto-titling of links makes it look like the links are both the same, but the first one leads to the macro and the second one to the trait.


Aliases are provided for include_slice for primitive number types, using them is a matter of personal preference. For example:

The code below still uses include_u32 instead of include_u32s (shouldn't cargo test have prevented this kind of typo, if it is a doctest?). In addition, I am still not really satisfied with the naming (and have never seen (primitive) types pluralized this way), but this is likely just personal preference and/or bikeshedding.

Good catch on the link, thanks (in any case, a redditor pointed out a different trait I should probably be using for the bound), will be fixed later today.

For the alias naming, I think the github readme is up-to-date, I just haven't pushed a new crates.io release yet.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.