The following describes a nut that I've been trying to crack for a couple of hours. Perhaps someone can help me out? (The following code came up in my exercises while learning Rust. I'm aware of the existence of crates for random number generation.)
In the following snippet, the function random_usize reads an usize value from /dev/urandom. I'm trying to come up with a version of that would be generic over integers. In order to achieve that I want to use the traits num_traits::{ToBytes, FromBytes}.
In the generic function random (that compiles, but doesn't work properly yet), I succeed to read into buf and the type of buf is [u8; 8] as shown by type_name_of_val. But it's not even possible to index that "buffer" (see commented-out line). It is possible to index the regular array other that seemingly has an equivalent type.
Is this again the case of type_name_of_val showing the concrete type of an opaque type?
If I uncomment the line trying to f.read_exact(...), the compiler suggests to constrain the associated type Bytes of the ToBytes trait. So OK, I can say T: ToBytes<Bytes = [u8; 8]> + Default, and then random runs for usize and for u64, but not for integer types of any other size. (Note ToBytes<Bytes = [u8]> doesn't work.)
How to make random generic on arbitrary integers?
use std::fs::File;
use std::io::Read;
use num_traits::{ToBytes, FromBytes};
// helper
fn type_name_of_val<T: ?Sized>(_val: &T) -> &'static str {
std::any::type_name::<T>()
}
// this works
fn random_usize() -> usize {
let mut f = File::open("/dev/urandom").unwrap();
let mut buf = 0usize.to_ne_bytes();
f.read_exact(&mut buf).unwrap();
usize::from_ne_bytes(buf)
}
// this doesn't
fn random<T>() -> T
where
T: ToBytes + Default,
{
let mut f = File::open("/dev/urandom").unwrap();
let mut buf = ToBytes::to_ne_bytes(&T::default());
let other = [0u8, 1u8, 2u8, 3u8];
dbg!(type_name_of_val(&buf)); // type_name_of_val(&buf) = "[u8; 8]"
dbg!(type_name_of_val(&other)); // type_name_of_val(&other) = "[u8; 4]"
dbg!(other[3]); // this works
// dbg!(buf[3]); // <-- error here -->
// f.read_exact(&mut buf).unwrap();
// FromBytes::from_ne_bytes(&buf)
T::default() // mock-up return value
}
fn main() {
println!("{}", random::<usize>())
}
In generic code, the methods are from trait bounds.
[n] operation needs Index trait, and because
container[index] is actually syntactic sugar for *container.index(index)
it'll move out the ownership.
But for your case, Copy is acceptable, so you'll also need it.
And for dbg! macro, Debug is needed.
fn random<T>() -> T
where
T: ToBytes + Default,
<T as ToBytes>::Bytes: Index<usize>,
<<T as num_traits::ToBytes>::Bytes as Index<usize>>::Output: Copy + Debug,
Not exactly, but almost. The ToBytes trait doesn't specify what the byte representation is, so even though you can print its debug name, your function cannot use it as an array. You could introduce a constant parameter <T, const N: usize> and use it as the length in ToBytes<Bytes = [u8; N]>. Or restrict it as @vague suggests (I'm typing slowly on the phone).
fn random<T, const N: usize>() -> T
where
T: ToBytes<Bytes = [u8; N]> + Default,
random::<usize, 8>()
Note: if you specify the incorrect length, there will be compiler error on the callsite.
error[E0271]: type mismatch resolving `<usize as ToBytes>::Bytes == [u8; 4]`
--> src/main.rs:38:29
|
38 | println!("{}", random::<usize, 4>())
| ^^^^^ expected an array with a fixed size of 4 elements, found one with 8 elements
|
note: required by a bound in `random`
--> src/main.rs:23:16
|
21 | fn random<T, const N: usize>() -> T
| ------ required by a bound in this function
22 | where
23 | T: ToBytes<Bytes = [u8; N]> + Default,
| ^^^^^^^^^^^^^^^ required by this bound in `random`
So if you go along this way, here's a solution to remove the const generics.
trait ArrayLength {
const N: usize;
}
impl ArrayLength for usize {
const N: usize = 8;
}
// other impls for integers
// (or maybe some crate defines this, then you don't have to write them)
fn random<T>() -> T
where
T: ArrayLength + ToBytes<Bytes = [u8; T::N]> + Default,
// T::N needs a nightly #![feature(generic_const_exprs)]
combining your suggestions and std::mem::size_of, here is what I was looking for:
#![feature(generic_const_exprs)]
#![allow(incomplete_features)]
use std::mem::size_of;
use std::fs::File;
use std::io::Read;
use num_traits::{ToBytes, FromBytes};
fn random<T>() -> T
where
T: Default,
T: ToBytes<Bytes = [u8; size_of::<T>()]>,
T: FromBytes<Bytes = [u8; size_of::<T>()]>,
{
let mut f = File::open("/dev/urandom").unwrap();
let mut buf = ToBytes::to_ne_bytes(&T::default());
f.read_exact(&mut buf).unwrap();
FromBytes::from_ne_bytes(&buf)
}
fn main() {
println!("{}", random::<i16>())
}
Interestingly, this requires an incomplete nightly-only feature to compile, so reading /dev/urandom into generic integers is not the total piece-of-cake exercise that it seemed to me at the outset.
I guess that currently there's no way to avoid an explicit N parameter other than using feature(generic_const_exprs) or adding a trait like ArrayLength above (that needs to be implemented for each concrete type)?
I know this, of course. But I grew used to the compiler explicitly suggesting missing where clauses. But here it just said:
error[E0608]: cannot index into a value of type `<T as num_traits::ToBytes>::Bytes`
Together with type_name_of_val saying that it's an array of u8, the old C++ programmer in me (old-style generic C++ functions do not verify "traits", they just duck-type) got confused.
I think @vague was on a better track in this case. The read_exact doesn't need a fixed size array (only a slice Read in std::io - Rust), so you can probably have T::Bytes: AsMut<[u8]> (see AsMut in std::convert - Rust) in your where clause (skip the explicit array requirement) and call f.read_exact(buf.as_mut()).unwrap();. That way, you only ask for something that you can use as a writeable sequence of bytes of any length. In fact, the Bytes associated type is already required to implement that trait and a bunch more, via the NumBytes trait: NumBytes in num_traits::ops::bytes - Rust. It may already be implied if I'm not dreaming that up.
For the conversion back, you can have T: FromBytes<Bytes = <T as ToBytes>::Bytes> to tell it that they have to be the same type. Then you shouldn't need to specify what it is for the conversion to just work. The <T as ToBytes> part disambiguates the following ::Bytes part, so it knows which Bytes it refers to.
Thanks for the suggestions. It works indeed:
(This version is also compatible with BufReader for much better performance when called repetitively.)
use std::fs::File;
use std::io::Read;
use num_traits::{ToBytes, FromBytes};
fn read_as<T>(file: &mut impl Read) -> T
where
T: Default + ToBytes,
<T as ToBytes>::Bytes: AsMut<[u8]>,
T: FromBytes<Bytes = <T as ToBytes>::Bytes>,
{
let mut buf = ToBytes::to_ne_bytes(&T::default());
file.read_exact(buf.as_mut()).unwrap();
FromBytes::from_ne_bytes(&buf)
}
fn main() {
let mut f = File::open("/dev/urandom").unwrap();
println!("{}", read_as::<i16>(&mut f))
}
To me, this was quite a good exercise in "where" clauses.
Hmm, just curious, in which way? I thought that my previous attempt was a realization of his suggestion to use Bytes = [u8; N]. I just replaced this by Bytes = [u8; size_of::<T>()].
Mainly because it doesn't rely on currently unstable features (you asked ), but also because it's less strict. It doesn't specify more than what you need for the function to function. The fact that they are fixed size arrays isn't important for it to write to them.
Here's is a bit of a tangent, but it's a matter of only asking for what will be used. That makes sure it will work in as many cases as possible, including for types you never have seen or heard of. It's reasonable to expect an array in this specific case, but I think Vec<u8> would fit as well. Now, why someone would use it is a different question.
This comes back to how trait bounds are a set of constraints that limits the set of types that can be used in place of the type parameter. The more traits that are added, and the more specific they are, the more the set is constrained. This means that what you generally want to do is to have permissive bounds and apply them close to where they are used.
If you have them on a generic struct definition (struct Foo<T: A + B>(T)), it becomes impossible to construct a value of it unless all of them can be fulfilled. That can look like a good thing at first, but it's overkill and too limiting if it doesn't use both traits together. It's also a breaking change to add another trait to it.
If you have them on the impl block or the function, you limit the requirements to that area. Then it's possible to only use the subset of functionality that works for the specific type that's substituted in. You're also free to add more functions or impl blocks with entirely different trait bounds.
You kind of get this "for free" in C++, by only accept what fits the syntax (simply put), as you know. Using the mindset of only asking for what you use and having traits that represent "capabilities" (like Add, Read, etc.) gives you something that's still quite flexible, but with type checking of the function before it's used.
(But I don't want to descend into exegesis of our own posts here. I was just curious why you said that @vague's solution was on the right track, while mine was just a simple continuation of it.)
I fully agree with the principle of only asking generic parameters to fulfill what is necessary. As we saw above this can be quite complicated and verbose, so compared to the duck typing of C++ templates, there's a price to be paid for robustness and error messages.
It's similar to the tradeoffs of static vs. dynamic typing, but on a meta level. It's not all positives, but they outweigh the negatives most of the time, IMO. Especially when writing libraries.
What I still do not understand in the following function is how the associated type is determined by the compiler.
The bounds specified after where are implicit requirements that must be fulfilled by the type T and by the associated type <T as ToBytes>::Bytes (which one of the bounds restricts to be the same as <T as FromBytes>::Bytes). Now the type T is fixed explicitly whenever the function is called, but the associated type is not. This is different from the other solution. Does the compiler search all possible types? Could there be a situation where the concrete associated type is ambiguous, because another type exists that fulfills all the requirements?