Confused by [u8; 4] that cannot be indexed

grothesque · March 11, 2024, 2:34pm

Hello,

The following describes a nut that I've been trying to crack for a couple of hours. Perhaps someone can help me out? (The following code came up in my exercises while learning Rust. I'm aware of the existence of crates for random number generation.)

In the following snippet, the function random_usize reads an usize value from /dev/urandom. I'm trying to come up with a version of that would be generic over integers. In order to achieve that I want to use the traits num_traits::{ToBytes, FromBytes}.

In the generic function random (that compiles, but doesn't work properly yet), I succeed to read into buf and the type of buf is [u8; 8] as shown by type_name_of_val. But it's not even possible to index that "buffer" (see commented-out line). It is possible to index the regular array other that seemingly has an equivalent type.

Is this again the case of type_name_of_val showing the concrete type of an opaque type?

If I uncomment the line trying to f.read_exact(...), the compiler suggests to constrain the associated type Bytes of the ToBytes trait. So OK, I can say T: ToBytes<Bytes = [u8; 8]> + Default, and then random runs for usize and for u64, but not for integer types of any other size. (Note ToBytes<Bytes = [u8]> doesn't work.)

How to make random generic on arbitrary integers?

use std::fs::File;
use std::io::Read;
use num_traits::{ToBytes, FromBytes};

// helper
fn type_name_of_val<T: ?Sized>(_val: &T) -> &'static str {
    std::any::type_name::<T>()
}

// this works
fn random_usize() -> usize {
    let mut f = File::open("/dev/urandom").unwrap();
    let mut buf = 0usize.to_ne_bytes();
    f.read_exact(&mut buf).unwrap();
    usize::from_ne_bytes(buf)
}

// this doesn't
fn random<T>() -> T
where
T: ToBytes + Default,
{
    let mut f = File::open("/dev/urandom").unwrap();
    let mut buf = ToBytes::to_ne_bytes(&T::default());
    let other = [0u8, 1u8, 2u8, 3u8];
    dbg!(type_name_of_val(&buf));   // type_name_of_val(&buf) = "[u8; 8]"
    dbg!(type_name_of_val(&other)); // type_name_of_val(&other) = "[u8; 4]"
    dbg!(other[3]);                 // this works
    // dbg!(buf[3]);                // <-- error here -->
    // f.read_exact(&mut buf).unwrap();
    // FromBytes::from_ne_bytes(&buf)
    T::default()             // mock-up return value
}

fn main() {
    println!("{}", random::<usize>())
}

vague · March 11, 2024, 2:42pm

In generic code, the methods are from trait bounds.

[n] operation needs Index trait, and because

container[index] is actually syntactic sugar for *container.index(index)

it'll move out the ownership.

But for your case, Copy is acceptable, so you'll also need it.

And for dbg! macro, Debug is needed.

fn random<T>() -> T
where
    T: ToBytes + Default,
    <T as ToBytes>::Bytes: Index<usize>,
    <<T as num_traits::ToBytes>::Bytes as Index<usize>>::Output: Copy + Debug,

Rust Playground

ogeon · March 11, 2024, 2:44pm

Not exactly, but almost. The ToBytes trait doesn't specify what the byte representation is, so even though you can print its debug name, your function cannot use it as an array. You could introduce a constant parameter <T, const N: usize> and use it as the length in ToBytes<Bytes = [u8; N]>. Or restrict it as @vague suggests (I'm typing slowly on the phone).

vague · March 11, 2024, 2:57pm

As @ogeon suggest, here's the code.

fn random<T, const N: usize>() -> T
where
    T: ToBytes<Bytes = [u8; N]> + Default,

random::<usize, 8>()

Note: if you specify the incorrect length, there will be compiler error on the callsite.

error[E0271]: type mismatch resolving `<usize as ToBytes>::Bytes == [u8; 4]`
  --> src/main.rs:38:29
   |
38 |     println!("{}", random::<usize, 4>())
   |                             ^^^^^ expected an array with a fixed size of 4 elements, found one with 8 elements
   |
note: required by a bound in `random`
  --> src/main.rs:23:16
   |
21 | fn random<T, const N: usize>() -> T
   |    ------ required by a bound in this function
22 | where
23 |     T: ToBytes<Bytes = [u8; N]> + Default,
   |                ^^^^^^^^^^^^^^^ required by this bound in `random`

So if you go along this way, here's a solution to remove the const generics.

trait ArrayLength {
    const N: usize;
}

impl ArrayLength for usize {
    const N: usize = 8;
}
// other impls for integers
// (or maybe some crate defines this, then you don't have to write them)

fn random<T>() -> T
where
    T: ArrayLength + ToBytes<Bytes = [u8; T::N]> + Default,
// T::N needs a nightly #![feature(generic_const_exprs)]

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=eba5574a5174930987d4a11722b293bc

jumpnbrownweasel · March 11, 2024, 4:47pm

That works for me when I fill in the FromBytes part.

fn random<T, const N: usize>() -> T
where
    T: ToBytes<Bytes = [u8; N]> + Default,
    T: FromBytes<Bytes = [u8; N]> + Default,
{
    let mut f = File::open("/dev/urandom").unwrap();
    let mut buf = ToBytes::to_ne_bytes(&T::default());
    f.read_exact(&mut buf).unwrap();
    FromBytes::from_ne_bytes(&buf)
}

grothesque · March 11, 2024, 5:08pm

Thanks, everyone,

combining your suggestions and std::mem::size_of, here is what I was looking for:

#![feature(generic_const_exprs)]
#![allow(incomplete_features)]
use std::mem::size_of;
use std::fs::File;
use std::io::Read;
use num_traits::{ToBytes, FromBytes};

fn random<T>() -> T
where
    T: Default,
    T: ToBytes<Bytes = [u8; size_of::<T>()]>,
    T: FromBytes<Bytes = [u8; size_of::<T>()]>,
{
    let mut f = File::open("/dev/urandom").unwrap();
    let mut buf = ToBytes::to_ne_bytes(&T::default());
    f.read_exact(&mut buf).unwrap();
    FromBytes::from_ne_bytes(&buf)
}

fn main() {
    println!("{}", random::<i16>())
}

Interestingly, this requires an incomplete nightly-only feature to compile, so reading /dev/urandom into generic integers is not the total piece-of-cake exercise that it seemed to me at the outset.

I guess that currently there's no way to avoid an explicit N parameter other than using feature(generic_const_exprs) or adding a trait like ArrayLength above (that needs to be implemented for each concrete type)?

grothesque · March 11, 2024, 5:18pm

I know this, of course. But I grew used to the compiler explicitly suggesting missing where clauses. But here it just said:

error[E0608]: cannot index into a value of type `<T as num_traits::ToBytes>::Bytes`

Together with type_name_of_val saying that it's an array of u8, the old C++ programmer in me (old-style generic C++ functions do not verify "traits", they just duck-type) got confused.

ogeon · March 11, 2024, 5:58pm

I think @vague was on a better track in this case. The read_exact doesn't need a fixed size array (only a slice Read in std::io - Rust), so you can probably have T::Bytes: AsMut<[u8]> (see AsMut in std::convert - Rust) in your where clause (skip the explicit array requirement) and call f.read_exact(buf.as_mut()).unwrap();. That way, you only ask for something that you can use as a writeable sequence of bytes of any length. In fact, the Bytes associated type is already required to implement that trait and a bunch more, via the NumBytes trait: NumBytes in num_traits::ops::bytes - Rust. It may already be implied if I'm not dreaming that up.

For the conversion back, you can have T: FromBytes<Bytes = <T as ToBytes>::Bytes> to tell it that they have to be the same type. Then you shouldn't need to specify what it is for the conversion to just work. The <T as ToBytes> part disambiguates the following ::Bytes part, so it knows which Bytes it refers to.

No nightly features required this way.

grothesque · March 12, 2024, 9:14pm

Thanks for the suggestions. It works indeed:
(This version is also compatible with BufReader for much better performance when called repetitively.)

use std::fs::File;
use std::io::Read;
use num_traits::{ToBytes, FromBytes};

fn read_as<T>(file: &mut impl Read) -> T
where
    T: Default + ToBytes,
    <T as ToBytes>::Bytes: AsMut<[u8]>,
    T: FromBytes<Bytes = <T as ToBytes>::Bytes>,
{
    let mut buf = ToBytes::to_ne_bytes(&T::default());
    file.read_exact(buf.as_mut()).unwrap();
    FromBytes::from_ne_bytes(&buf)
}

fn main() {
    let mut f = File::open("/dev/urandom").unwrap();
    println!("{}", read_as::<i16>(&mut f))
}

To me, this was quite a good exercise in "where" clauses.

Hmm, just curious, in which way? I thought that my previous attempt was a realization of his suggestion to use Bytes = [u8; N]. I just replaced this by Bytes = [u8; size_of::<T>()].

ogeon · March 12, 2024, 10:12pm

Mainly because it doesn't rely on currently unstable features (you asked ), but also because it's less strict. It doesn't specify more than what you need for the function to function. The fact that they are fixed size arrays isn't important for it to write to them.

Here's is a bit of a tangent, but it's a matter of only asking for what will be used. That makes sure it will work in as many cases as possible, including for types you never have seen or heard of. It's reasonable to expect an array in this specific case, but I think Vec<u8> would fit as well. Now, why someone would use it is a different question.

This comes back to how trait bounds are a set of constraints that limits the set of types that can be used in place of the type parameter. The more traits that are added, and the more specific they are, the more the set is constrained. This means that what you generally want to do is to have permissive bounds and apply them close to where they are used.

If you have them on a generic struct definition (struct Foo<T: A + B>(T)), it becomes impossible to construct a value of it unless all of them can be fulfilled. That can look like a good thing at first, but it's overkill and too limiting if it doesn't use both traits together. It's also a breaking change to add another trait to it.

If you have them on the impl block or the function, you limit the requirements to that area. Then it's possible to only use the subset of functionality that works for the specific type that's substituted in. You're also free to add more functions or impl blocks with entirely different trait bounds.

You kind of get this "for free" in C++, by only accept what fits the syntax (simply put), as you know. Using the mindset of only asking for what you use and having traits that represent "capabilities" (like Add, Read, etc.) gives you something that's still quite flexible, but with type checking of the function before it's used.

grothesque · March 12, 2024, 10:37pm

Here is what @vague proposed (it already requires nightly): Rust Playground
My second attempt was just a cleanup of that, where I replaced the ArrayLength trait with size_of.

(But I don't want to descend into exegesis of our own posts here. I was just curious why you said that @vague's solution was on the right track, while mine was just a simple continuation of it.)

I fully agree with the principle of only asking generic parameters to fulfill what is necessary. As we saw above this can be quite complicated and verbose, so compared to the duck typing of C++ templates, there's a price to be paid for robustness and error messages.

ogeon · March 12, 2024, 10:57pm

Ah, I was thinking of the first response, witch only requires the Index trait: Confused by [u8; 4] that cannot be indexed - #2 by vague

It's similar to the tradeoffs of static vs. dynamic typing, but on a meta level. It's not all positives, but they outweigh the negatives most of the time, IMO. Especially when writing libraries.

grothesque · March 13, 2024, 7:05am

What I still do not understand in the following function is how the associated type is determined by the compiler.

The bounds specified after where are implicit requirements that must be fulfilled by the type T and by the associated type <T as ToBytes>::Bytes (which one of the bounds restricts to be the same as <T as FromBytes>::Bytes). Now the type T is fixed explicitly whenever the function is called, but the associated type is not. This is different from the other solution. Does the compiler search all possible types? Could there be a situation where the concrete associated type is ambiguous, because another type exists that fulfills all the requirements?

Cerber-Ursi · March 13, 2024, 7:07am

It is uniquely defined by T, since one can implement any trait only once for each type.

system · June 11, 2024, 7:08am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Make a type generic over u8 and u16 only?	24	2191	May 11, 2024
Constraint for size_of::<T>() in constant expression help	11	1114	December 12, 2023
Generic function using variable of unknown size	8	1441	January 12, 2023
Trait for "implements le_bytes"? help	13	2696	March 4, 2021
Generics/bounds for numeric types help	10	6204	January 12, 2023

Confused by [u8; 4] that cannot be indexed

Related topics