Make a type generic over u8 and u16 only?

I'm trying to make a data structure that's backed by a Vec of either u8 or u16, depending on the expected number of elements in it. It will also be assumed that the max length is either u8::MAX or u16::MAX respectively.

struct Thing<U>
where U: u18 or u16 {
	indices: Vec<U>
}

It would be useful to be able to switch from a Thing<u8> to a Thing<u16>, i.e.

let thing = Thing<u8>::new();

...

if thing.indices.len() > 200 {
	let new_thing: Thing<u16> = thing.to_bigger();
}

but I would hope to not end up with two generated versions of the code in the final binary... a u16 is not that different from a u8.

I found two StackOverflow questions about this, but they're quite old and solutions a bit tedious.

The first one states just what I'm looking for, but the answer was to make a trait for the generic, then implement the trait for the types I want, i.e. u8 and u16.

pub trait ThingIndex {}

impl ThingIndex for u8 {}
impl ThingIndex for u16 {}

Then I'd have to explicitly declare the functionality of the u8 and u16 types and I'm not sure how to use it:

ThingIndex: Add<Output = Self> + Mul<Output = Self> + ... {}

Another solution was to use an enum:

enum Unsigned {
    U8(u8),
    U16(u16),
}
struct Thing {
	indices: Vec<ThingIndex>
}

But then each item in indices could be either one, which is not what I want.

Here in the Rust forum I found a few suggestions from quinedot, of which the third one looked most appealing. I admit though, I don't use traits as much as I should and don't understand the suggestion.

Through no fault of the authors of the mentioned solutions, I find that none of them convey the intent of the desired struct.

Are any of these solutions the de facto way to do it?
Does it get compiled down to simply the u8 or u16, without adding extra things like tags, padding, etc.?

One thing that bothers me when indexing into an array or vector, is having to cast to usize. I know it's a trivial operation, but if for example Thing<u16>.indices can only be as many as u16::MAX (because using Thing's API would ensure as much), and indices stores u16 and not usize, it doesn't make sense to have to fill in a usize with extra bits just to index into indices.

By that, I wish there was a way to index into it directly without having to cast or implement Into or From, which just does the cast there anyway.

I saw your proposed Thing struct has a Vec in it. A u16 is a different size from a u8 so I don't see how sharing machine code would be possible. In one case, to read a value from index i you would have to read 2 bytes from offset 2 * i and in the other you need 1 byte from offset i, so different instructions would be required. Pretty much any operation that touches memory (i.e. cannot be done entirely in registers) would require separate instructions for the two types. The generic + trait bound solutions you have posted would also generate separate code for u8 and u16 in most cases.

For writing code that is generic over u8 and u16, I have seen two approaches: the generic + trait bounds approach you have already posted, or writing all your code in a macro and invoking that macro on u8 and u16. The macro approach has the advantage of not needing to write a bunch of verbose trait bounds but the disadvantage is all your code is in a macro. There is a slightly cursed 3rd idea I have had but never seen anywhere which is abusing the #[path] attribute on modules to avoid both macros and trait bounds. Consider:

//! main.rs

mod mod_u8 {
    type U = u8;
    pub type Thing8 = thing::Thing;

    #[path = "../thing.rs"]
    #[allow(clippy::duplicate_mod)]
    mod thing;
}

mod mod_u16 {
    type U = u16;
    pub type Thing16 = thing::Thing;

    #[path = "../thing.rs"]
    mod thing;
}

fn main() {
    let a = mod_u8::Thing8 { inner: 255 };
    let b = mod_u16::Thing16 { inner: 255 };

    println!("{}, {}", a.wrapping_add(1), b.wrapping_add(1)); // 0, 256
}

and

//! thing.rs

use super::U;
pub struct Thing {
    pub inner: U,
}

impl Thing {
    pub fn wrapping_add(&self, add: U) -> U {
        self.inner.wrapping_add(add)
    }
}

This is something I have wondered too. Since the library includes impl From<u8> for usize and impl From<u16> for usize, I am not sure if there is some reason it can not also include impl<T> SliceIndex<[T]> for u8 and u16.

2 Likes

What kind of operations are you doing with he elements? Are they supposed to be indices?

Maybe you can write the collection as an enum of either Vec<u8> or Vec<u16>, and always convert the element to u16 on get, then back to the underlying type on set/insert? Also you could implement Index<u8|u16> directly this way.

Rust constrains generic parameter by traits, not by concrete types, so the "only" requirement goes against that. You can make a type generic over u8 and u16 and also any other type that implements the required traits.

Here is an example of a data structure I made recently where the indexes can be u8, u16, u32, u64... etc. : heap.rs - source

Hope that helps!

[ Edit: I think you technically CAN provide implementations only for specific types, but then you end up with duplicated code, it is messy... ]

1 Like

Afair, The assembly on most architectures has to do that anyways.

4 Likes

You want something like this:

enum Thing {
    Short(Vec<u8>),
    Long(Vec<u16>),
}
2 Likes

If you want to use generics, I would suggest using num-traits for convenience. Type parameters in Rust have to work for any type the caller puts in, as long as that type implements the required traits. That is, the set of valid types T is constrained by the trait bounds, but you aren't allowed to assume what the concrete set is. You can only add more constraints.

That said, I think the solution depends on what you actually need to do. First, I would suggest not putting all the trait bounds on the struct definition. That requires you to front load everything you ever need to do and forces you to repeat those bounds everywhere.

Secondly, I would suggest thinking about what you need to implement uniquely for each type and what is shared. You can probably have concrete impl Thing<u8> and impl Thing<u16> blocks where you know exactly what types you are dealing with. The remaining functionality that both need to do the same way can be in one or more impl<T> Thing<T> blocks with varying trait bounds.

If you have some place where you need to go from a generic type to a specific type, you can have a trait that implements conversion to an enum with either variant of Thing and then match on it. Going the other direction is trickier and depends on the use case, but sometimes possible.

On x86, this is usually automatic. Operations that act on the low 16 or 32 bits of a register clear the remaining bits.

My understanding is that this would break type inference.

Nope.

Yes for 32 bits, no for 16 bits or 8 bits. If you don't know how hardware that you target works then maybe, just maybe, it's a bit too early to worry about what happens at that level, don't you think?

1 Like

Thanks for the correction.

1 Like

That's an interesting approach I might explore, thanks for sharing! I'm curious though, what if the code becomes part of a crate I'd use from the crate registry instead of from a local.. would the relative paths to thing.rs still work the same?

I would not try to use the #[path] abuse method across crates. Even if someone managed to find a way to do it, that makes your code dependent on the source file hierarchy of the other crate, not just the public API and that is too cursed even for me to contemplate. If you want crate users to be able to use Thing with types other than just the types you provide, I would go with the generic + trait bounds method.

The u8s or u16s stored in the Vec are indices, and the only other operation I'm doing with them besides indexing is addition. Similar to IDs in databases, they find their way around the entire code base and other structs will store some of the indices too.

Because of the amount of times I use them to index into slices, unwrapping an enum at every site is not thrilling and I think that would also be more overhead than casting. Though at least the enum wouldn't have the extra padding of being the size of the bigger unsigned integer like it would if I used the enum Unsigned {...} approach I mentioned originally, since it will always be 24 bytes for the vec.

Rust constrains generic parameter by traits, not by concrete types

That's helpful in understanding why this, and generics in general, seem so verbose.

I read through your code and guess that this is what restricts it to the unsigned integers? Does signed integers not implement those, and that's why your GHeap doesn't accept signed integers?

Huh.. I didn't know this. I thought Rust decided on usize as the default type for indexing because it was the common case and was more efficient for that case... not that it was actually the only way at the end of the day..

that makes your code dependent on the source file hierarchy of the other crate

Yeah that's kinda what I was thinking.. I'll stick to what's best for a public crate, but it's still a neat approach though.

The generic approaches probably won't help on their own if you want to dynamically switch between Thing<u8> and Thing<u16>, like your OP said. First of all, the converted object will need to switch to some other code path that deals with Thing<u16>. Second of all, if other structs are around holding onto indices into the converted object, how are they going to know they're associated with a new type now? If it's a dynamic distinction, they'll have to check dynamically, e.g. via an enum.

You can do it with trait objects instead, but "follow vtable pointers" is probably worse than "branch on enum variant", performance-wise.

Additionally, in your OP you also said the Thing<u8> wouldn't have more than 255 items, so we're talking about a size savings of perhaps 1/4th of a kilobyte over Thing<u16> (internally). That seems like not much juice for a lot of squeeze, over just using u16 always.

(Having different versions to choose from statically (at compile time) is another story, and that's what generics / macro'd up implementations will get you.)

2 Likes

if other structs are around holding onto indices into the converted object, how are they going to know they're associated with a new type now?

That's a good point! I was thinking this could be taken care of during the to_bigger() step, but I haven't given much thought to the idea and you're right that the savings, if any, are probably not worth it.

I was thinking of how other people might potentially use it, but for my use case the size is determined at compile time so.. it probably is best to not handle a case that isn't proven to be needed.

It does accept signed integers (for the indexing type), except for i8 which doesn't implement From<u8> ( since numbers in the range 128..255 can be represented by u8 but not by i8 ). i16, i32, i64 are all fine though.
This bound is actually only used to convert integers 0,1 and 2, so is stricter than necessary, but there isn't any unsigned system type smaller than u8. What I need there really is a constraint From<u2>.

That said, there is no reason to use i8 (or any signed integer type come to that) here, so I don't think it is a real shortcoming.

1 Like