Is it safe to transmute Vec<Vec<char>> to Vec<Vec<u32>>?

The title really says it all. The layout of char and u32 should be identical so it would make sense if this were possible, but I know that there is no guarantee that the same struct with different generics will have the same field order, so I could also believe that it's UB. I tried it and it worked, but that doesn't prove anything.

Even if it is safe, I know transmute should only be used if there is no reasonable alternative. Is there a better way to convert a Vec<Vec<char>> to Vec<Vec<u32>> in O(1) time and O(1) space? If not, what about O(n) time and O(1) space?

1 Like

I am pretty sure it is UB.

What does Miri say?

I can't think of something that uses constant space. You can convert from Vec<char> for Vec<u32>, using the technique suggested in the docs for Vec::from_raw_parts.

A straightforward

x.into_iter().map(|v| v.into_iter().map(|c| c as u32).collect()).collect()

manages to run in O(1) space [1], and optimize out the inner loop, so it’s O(n) where n is the size of the outer vec only. E.g., you can see that these two functions generate the same assembly:

pub fn convert1(x: Vec<Vec<char>>) -> Vec<Vec<u32>> {
    x.into_iter().map(|v| v.into_iter().map(|c| c as u32).collect()).collect()
}

pub fn convert2(x: Vec<Vec<char>>) -> Vec<Vec<char>> {
    x.into_iter().map(|v| v).collect()
}

  1. there’s specialization on .collect() that makes such a chain avoid allocating a new Vec ↩︎

11 Likes

I don't know the answer to your question. I would think the sane thoughts you are and have the same question.

But I'm wondering whether there is any other approach that doesn't involve doubt the conversation at all? If there is an external interface that requires specifically Vec<Vec<u32>> I guess you have no choice, but if you have some level of control, could you maybe adjust the interface to use iterators or to be generic in T: Copy + Into<u32>?

Aren't we re-allocating the outer vector? So isn't it O(n) space, where n is the size of the outer vector?

I don’t think we are, no. As I’ve noted (after an edit) in the footnote, there’s specialization on .collect() that makes chains such as .into_iter().map(…).collect() avoid allocating a new Vec (as long as the size and alignment of the item types match, which they should between Vec<char> and Vec<u32>).

10 Likes

I heard about that, it's why this code can run in constant time and space:

pub fn convert_to_u32(vec: Vec<char>) -> Vec<u32> {
    vec.into_iter().map(|c| c as u32).collect()
}

Here's the generated code:

example::convert_to_u32:
        mov     rax, rdi
        mov     rcx, qword ptr [rsi]
        movups  xmm0, xmmword ptr [rsi + 8]
        mov     qword ptr [rdi], rcx
        movups  xmmword ptr [rdi + 8], xmm0
        ret

I think it can reuse the allocation even if the sizes don't match, as long as the new type isn't larger.

3 Likes

Off topic: how did you create the footnote?

It’s

Some text ^[with a footnote] for you :-)

Some text [1] for you :slight_smile:


  1. with a footnote ↩︎

4 Likes

How does this specialization work? Does it do a transmute of the underlying buffer pointers?

It involves traits such as this one (unfortunately hidden in newer std docs); some of the implementation then is here. See also this comment.

Also here's a link for the recent hidden module-level documentation.

3 Likes
pub fn convert2(x: Vec<Vec<char>>) -> Vec<Vec<char>> {
    x.into_iter().map(|v| v).collect()
}

The assembly generated for this function really confuses me, because an otherwise identical function will compile very differently when x: Vec<u32>. Any idea why one of them appears to be O(1) and the other appears to be O(n)? They are both equivalent to just returning their parameter, and I don't see why the compiler can figure that out for some types but not others.

Presumably the issue that’s mentioned in that internal module-level documentation is relevant here, too:

https://github.com/rust-lang/rust/issues/79308

1 Like

Is it safe to transmute Vec

No, it isn't. It's not safe to transmute any Vec. The layout of Vec is #[repr(Rust)], so it's not guaranteed that Vec<U> has the same layout as Vec<T> for U != T.

In general, you should basically never transmute pointer-like types, and instead use pointer casting to perform a single level of type punning. (I don't believe there is ever a good reason and/or a sound way to hop through multiple levels of pointers while type punning. Just do the naïve thing first, and try to be clever and unsafe only when it makes a measurable difference. More often than not, it doesn't.)

7 Likes

Assuming I didn’t make any mistakes, this code should be sound, and it optimizes properly

use std::mem::ManuallyDrop;

pub fn convert(x: Vec<Vec<char>>) -> Vec<Vec<u32>> {
    // fallback implementation in case of weird future changes to (unstable) `Vec` layout
    if std::alloc::Layout::new::<Vec<u32>>() != std::alloc::Layout::new::<Vec<char>>() {
        return x.into_iter().map(|v| v.into_iter().map(|c| c as u32).collect()).collect();
    }
    // inline the definition of (still unstable) Vec::into_raw_parts:
    let mut x = ManuallyDrop::new(x);
    let (x_ptr, len, cap) = (x.as_mut_ptr(), x.len(), x.capacity());
    // ^ end of Vec::into_raw_parts
    let end = unsafe {
        x_ptr.add(len)
    };
    let mut x_ptr_i = x_ptr;
    while x_ptr_i < end {
        unsafe {
            let v = x_ptr_i.read();
            // inline the definition of (still unstable) Vec::into_raw_parts:
            let mut v = ManuallyDrop::new(v);
            let (v_ptr, len, cap) = (v.as_mut_ptr(), v.len(), v.capacity());
            // ^ end of Vec::into_raw_parts
            let v_ptr = v_ptr.cast::<u32>();
            let v = Vec::from_raw_parts(v_ptr, len, cap);
            x_ptr_i.cast::<Vec<u32>>().write(v);
            x_ptr_i = x_ptr_i.add(1);
        }
    }
    unsafe {
        Vec::from_raw_parts(x_ptr.cast::<Vec<u32>>(), len, cap)
    }
}
6 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.