[u8; 8] to two [u8; 4]

Besides copying the u8's one by one, is there a way to split a [u8; 8] into two [u8; 4] ?

Context: using u32::read_le_bytes, which for some reason, wants an array instead of a slice.

The example for from_le_bytes shows the TryFrom/TryInto based conversion route: u32 - Rust

I'm trying to avoid unwrap() in my code. I don't understand why it needs to use TryFrom/TryInto.

It doesn't need unwrap, but I'm not sure if there is a way in std/core to do this without a fallible conversion (that won't be able to fail in your case if you write it neatly). There are certainly alternatives in the ecosystem.

Because slices don't encode their length in the type. So to go from &[T] to [T; 4] can fail.

On nightly they've started adding methods to arrays that use const generics to get sub-arrays with guaranteed lengths; currently there are methods called split_array_ref and split_array_mut.

However, they currently don't give you [T; N] -> ([T; M], [T; N-M]) because (I assume) const generics are not powerful enough yet.

I think we need further advances in const generics, and then non-fallible conversions can be implemented more easily. As it is now it either has to be type checked at runtime (incorrect array lengths panicking at runtime) or the trait needs to encode the type/array size correspondances ad hoc.

it's certainly possible to define a safe function that takes [T; 8] and returns [[T; 4]; 2] today. It's unfortunately too special case to be in libstd, since it can't be written with const generics for all compatible sizes.

Another example is crate bytemuck which allows this conversion with its cast function - for arrays of u8. Just like split_array_ref, it panics on "runtime type checking" i.e mismatching array lengths, unfortunately. A side effect of being defined in general terms.

Now this is just mostly my curiosity, but with array/slice patterns we have much better conversions, they are just not, unfortunately, generic in the size of the array.

These are "better conversions" since they are type checked (size of array is checked at compile time) and will fail to compile instead of panic if there is a problem.

(playground link)

The gist of it:

/// Take first four elements of an array
macro_rules! take4 {
    ($array:expr) => {
        match $array {
            [a, b, c, d, ..] => [a, b, c, d]
        }
    }
}

let data = [1, 2, 3, 4, 5, 6, 7, 8];    
u32::from_le_bytes(take4!(data))
1 Like

In theory, the standard library should be able to expose a function/method which does the split using const generics and pointer math. The implementation is trivial.

(I had to introduce my own Array type because I can't directly add methods to the builtin array type)

/// A newtype around an array so we can give it methods.
struct Array<T, const LEN: usize>([T; LEN]);

impl<T, const LEN: usize> Array<T, LEN> {
    pub fn split_at<const INDEX: usize>(&self) -> (&[T; INDEX], &[T; LEN - INDEX]) {
        // Safety: const generics ensure our bounds checks are correct, and the
        // function signature makes sure we don't accidentally transmute lifetimes
        // incorrectly.
        //
        // We can also assume ptr.add() doesn't wrap around because otherwise
        // you wouldn't be able to get a reference to the last element in this
        // array normally.
        unsafe {
            let ptr = self.0.as_ptr();
            let head = ptr.cast::<[T; INDEX]>();
            let tail = ptr.add(INDEX).cast::<[T; LEN - INDEX]>();
            (&*head, &*tail)
        }
    }
}

(playground)

You would then use it like this:

fn main() {
    let array = Array([0_u8; 8]);

    let (first_half, second_half) = array.split_at::<4>();
    assert_eq!(first_half.len(), 4);
    assert_eq!(second_half.len(), 4);

    println!(
        "{}, {}",
        u32::from_le_bytes(*first_half),
        u32::from_le_bytes(*second_half),
    );

    // let _ = array.split_at::<10>(); // compile error
}

As a bonus, we get bounds checking at compile time!

error[E0080]: evaluation of `Array::<u8, 8_usize>::split_at::<10_usize>::{constant#1}` failed
 --> src/main.rs:7:70
  |
7 |     pub fn split_at<const INDEX: usize>(&self) -> (&[T; INDEX], &[T; LEN - INDEX]) {
  |                                                                      ^^^^^^^^^^^ attempt to compute `8_usize - 10_usize`, which would overflow
2 Likes

If that compiles today, you should propose it on the tracker/as an RFC to replace the current split_array variants, it's strictly more powerful.

fn convert(x: [u8;8]) -> [[u8;4];2] {
    unsafe {std::mem::transmute(x)}
}

if you want a tuple:

fn convert(x: [u8;8]) -> ([u8;4],[u8;4]) {
    let arr : [[u8;4];2] = unsafe {std::mem::transmute(x)};
    (arr[0],arr[1])
}

both are sound.

That's your claim.

The advancement we need :slightly_smiling_face: is #![feature(generic_const_exprs)] (or a subset of it).

These are implemented by bytemuck too (as mentioned before), but type (size) checked at runtime, so one could just as well just wrap bytemuck::cast.

mem::transmute

The [[u8;4];2] case is sound because of the way arrays are laid out in memory.

The elements in an array, [T; N], will be laid out sequentially with no padding when size_of::<T>() == align_of::<T>(), so it's fine to transmute it that way. See Arrays and Slices in the unsafe code guidelines for more.

The tuple case isn't sound because a tuple is represented as something like this:

#[repr(Rust)]
struct Tuple<A, B> {
  first: A,
  second: B,
}

Because they are #[repr(Rust)] you can't make any assumptions about layout, including that the 0'th element in the tuple will be first in memory. See Tuple Types in the unsafe code guidelines for more.

If you want to return a tuple, you would need something like this:

fn split<T, const LEN: usize, const INDEX: usize>(
    array: [T; LEN],
) -> ([T; INDEX], [T; LEN - INDEX]) {

    #[repr(packed)]
    struct Tuple<T, const LEN: usize, const INDEX: usize>
    where
        [(); LEN - INDEX]:,
    {
        first: [T; INDEX],
        second: [T; LEN - INDEX],
    }

    unsafe {
        let Tuple { first, second }: Tuple<T, LEN, INDEX> = std::mem::transmute(array);
        (first, second)
    }
}

(playground)

That was actually my first attempt, but it's not great because a) you need to carry the T and LEN generic parameters around so using turbofish for the INDEX parameter gets a bit awkward (e.g. split::<_, _, 4>([0_u8; 8])), and b) it doesn't compile because generic_const_exprs is incomplete and the compiler thinks the array and Tuple<T, LEN, INDEX> types have different sizes ("dependently-sized types" is the bit to look out for).

warning: the feature `generic_const_exprs` is incomplete and may not be safe to use and/or cause compiler crashes
 --> src/lib.rs:1:12
  |
1 | #![feature(generic_const_exprs)]
  |            ^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(incomplete_features)]` on by default
  = note: see issue #76560 <https://github.com/rust-lang/rust/issues/76560> for more information

error[E0512]: cannot transmute between types of different sizes, or dependently-sized types
  --> src/lib.rs:17:61
   |
17 |         let Tuple { first, second }: Tuple<T, LEN, INDEX> = std::mem::transmute(array);
   |                                                             ^^^^^^^^^^^^^^^^^^^
   |
   = note: source type: `[T; LEN]` (this type does not have a fixed size)
   = note: target type: `Tuple<T, LEN, INDEX>` (size can vary because of [T; INDEX])

Actually the tuple case is also sound because it's still transmuting to a [[u8; 4]; 2] and then making a tuple out of it.

I wasn't saying it is all unsound, I was hinting that just saying "It is sound" shouldn't be enough to justify transmute, @krtab should have quoted exactly what you did now.

3 Likes

Haha, looks like I got caught not properly reading the code in question :sweat_smile:

I read it as a direct transmute from an array to a tuple.

2 Likes

I wasn't saying it is all unsound, I was hinting that just saying "It is sound" shouldn't be enough to justify transmute , @krtab should have quoted exactly what you did now.

I disagree that it is your place to judge the amount of work I owe to put into a response that was relevant to the OP's need. Next time, feel free to do yourself the (very welcome) work that @Michael-F-Bryan did.

1 Like

Seriously, just do this:

let arr1 = [arr[0], arr[1], arr[2], arr[3]];
let arr2 = [arr[4], arr[5], arr[6], arr[7]];
12 Likes

If you're unwilling to use one of the via-slice routes, then one-by-one sounds great. Just use an array pattern and it's super-clear:

pub fn demo(x: [u8; 8]) -> [[u8; 4]; 2] {
    let [a, b, c, d, e, f, g, h] = x;
    [[a, b, c, d], [e, f, g, h]]
}

https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=ad90273626b5139a25033c99bb987cc1

And it compiles away to nothing:

define i64 @_ZN10playground4demo17h1f02f025652a6996E(i64 returned %0) unnamed_addr #0 {
start:
  ret i64 %0
}

Of course, there are other ways too that don't use .unwrap() but are (IMHO) less clear, like

pub fn demo2(x: [u8; 8]) -> [[u8; 4]; 2] {
    let x = u64::from_le_bytes(x);
    [u32::to_le_bytes(x as _), u32::to_le_bytes((x >> 32) as _)]
}
3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.