# Is it safe to transmute Vec<Vec<char>> to Vec<Vec<u32>>?

The title really says it all. The layout of `char` and `u32` should be identical so it would make sense if this were possible, but I know that there is no guarantee that the same struct with different generics will have the same field order, so I could also believe that it's UB. I tried it and it worked, but that doesn't prove anything.

Even if it is safe, I know `transmute` should only be used if there is no reasonable alternative. Is there a better way to convert a `Vec<Vec<char>>` to `Vec<Vec<u32>>` in O(1) time and O(1) space? If not, what about O(n) time and O(1) space?

1 Like

I am pretty sure it is UB.

What does Miri say?

I can't think of something that uses constant space. You can convert from `Vec<char>` for `Vec<u32>`, using the technique suggested in the docs for `Vec::from_raw_parts`.

A straightforward

``````x.into_iter().map(|v| v.into_iter().map(|c| c as u32).collect()).collect()
``````

manages to run in O(1) space , and optimize out the inner loop, so it’s O(n) where n is the size of the outer vec only. E.g., you can see that these two functions generate the same assembly:

``````pub fn convert1(x: Vec<Vec<char>>) -> Vec<Vec<u32>> {
x.into_iter().map(|v| v.into_iter().map(|c| c as u32).collect()).collect()
}

pub fn convert2(x: Vec<Vec<char>>) -> Vec<Vec<char>> {
x.into_iter().map(|v| v).collect()
}
``````

1. there’s specialization on `.collect()` that makes such a chain avoid allocating a new `Vec` ↩︎

11 Likes

I don't know the answer to your question. I would think the sane thoughts you are and have the same question.

But I'm wondering whether there is any other approach that doesn't involve doubt the conversation at all? If there is an external interface that requires specifically `Vec<Vec<u32>>` I guess you have no choice, but if you have some level of control, could you maybe adjust the interface to use iterators or to be generic in `T: Copy + Into<u32>`?

Aren't we re-allocating the outer vector? So isn't it O(n) space, where n is the size of the outer vector?

I don’t think we are, no. As I’ve noted (after an edit) in the footnote, there’s specialization on `.collect()` that makes chains such as `.into_iter().map(…).collect()` avoid allocating a new `Vec` (as long as the size and alignment of the item types match, which they should between `Vec<char>` and `Vec<u32>`).

10 Likes

I heard about that, it's why this code can run in constant time and space:

``````pub fn convert_to_u32(vec: Vec<char>) -> Vec<u32> {
vec.into_iter().map(|c| c as u32).collect()
}
``````

Here's the generated code:

``````example::convert_to_u32:
mov     rax, rdi
mov     rcx, qword ptr [rsi]
movups  xmm0, xmmword ptr [rsi + 8]
mov     qword ptr [rdi], rcx
movups  xmmword ptr [rdi + 8], xmm0
ret
``````

I think it can reuse the allocation even if the sizes don't match, as long as the new type isn't larger.

3 Likes

Off topic: how did you create the footnote?

It’s

``````Some text ^[with a footnote] for you :-)
``````

Some text  for you 1. with a footnote ↩︎

4 Likes

How does this specialization work? Does it do a transmute of the underlying buffer pointers?

It involves traits such as this one (unfortunately hidden in newer std docs); some of the implementation then is here. See also this comment.

Also here's a link for the recent hidden module-level documentation.

3 Likes
``````pub fn convert2(x: Vec<Vec<char>>) -> Vec<Vec<char>> {
x.into_iter().map(|v| v).collect()
}
``````

The assembly generated for this function really confuses me, because an otherwise identical function will compile very differently when `x: Vec<u32>`. Any idea why one of them appears to be O(1) and the other appears to be O(n)? They are both equivalent to just returning their parameter, and I don't see why the compiler can figure that out for some types but not others.

Presumably the issue that’s mentioned in that internal module-level documentation is relevant here, too:

1 Like

Is it safe to transmute `Vec`

No, it isn't. It's not safe to transmute any `Vec`. The layout of `Vec` is `#[repr(Rust)]`, so it's not guaranteed that `Vec<U>` has the same layout as `Vec<T>` for `U != T`.

In general, you should basically never transmute pointer-like types, and instead use pointer casting to perform a single level of type punning. (I don't believe there is ever a good reason and/or a sound way to hop through multiple levels of pointers while type punning. Just do the naïve thing first, and try to be clever and `unsafe` only when it makes a measurable difference. More often than not, it doesn't.)

7 Likes

Assuming I didn’t make any mistakes, this code should be sound, and it optimizes properly

``````use std::mem::ManuallyDrop;

pub fn convert(x: Vec<Vec<char>>) -> Vec<Vec<u32>> {
// fallback implementation in case of weird future changes to (unstable) `Vec` layout
if std::alloc::Layout::new::<Vec<u32>>() != std::alloc::Layout::new::<Vec<char>>() {
return x.into_iter().map(|v| v.into_iter().map(|c| c as u32).collect()).collect();
}
// inline the definition of (still unstable) Vec::into_raw_parts:
let mut x = ManuallyDrop::new(x);
let (x_ptr, len, cap) = (x.as_mut_ptr(), x.len(), x.capacity());
// ^ end of Vec::into_raw_parts
let end = unsafe {
};
let mut x_ptr_i = x_ptr;
while x_ptr_i < end {
unsafe {
// inline the definition of (still unstable) Vec::into_raw_parts:
let mut v = ManuallyDrop::new(v);
let (v_ptr, len, cap) = (v.as_mut_ptr(), v.len(), v.capacity());
// ^ end of Vec::into_raw_parts
let v_ptr = v_ptr.cast::<u32>();
let v = Vec::from_raw_parts(v_ptr, len, cap);
x_ptr_i.cast::<Vec<u32>>().write(v);