Is this use of `std::mem::transmute` undefined behavior?

I was creating a graph structure in C++, and realized that the reinterpret_cast I was going to use was undefined behavior. I was wandering if the same would also be UB in Rust.

So! I have a Vec<data>, and I wanted to store indexes externally (so the data can freely be moved in memory without invalidating references. To be more precise, I have multiple vec, each of them storing a different types. To add some safeguards, I added a tag type, to know which vector I want to index. For this, I created a Handle<Tag> class.

struct Handle<Tag> {
    index: usize,
    phantom: std::marker::PhantomData<Tag>,
}

Those handles are themselves stored in another container. I’m using char for the type of the data and Vec for the collection that store the handles, but in my C++ code, it was more complex.

struct A { 
    data: Vec<char>, // can be more complicated than just `char`
    indexes: Vec<Handle<A>>, // could be stored in a `HashMap` instead
}
// impl Index<Handle<A>> for A

Then I’m transforming the data encapsulated in A which generate a new struct:

struct B { 
    data: Vec<char>, // the type may have change after the transformation
    indexes: Vec<Handle<B>>,
}
// impl Index<Handle<B>> for B

The order of the elements inside data is kept during the transformation. This means that transforming A::indexes into B::indexes mean creating a new vector that will contains exactly the same values.

fn some_transformation(data: Vec<char>) -> Vec<char> {
    data
}

let a = A {
    data: vec!['a', 'b', 'c'],
    indexes: vec![Handle::new(1), Handle::new(2)]
};

let b = B {
    data: some_transformation(a.data),
    indexes: a.indexes
        .iter()
        .map(|handle| Handle::new(handle.index)) // copy the value of the handle
        .collect(),
};
assert_eq!(b.data[b.indexes[0].index], 'b');

Is the following optimization valid or totally UB?

let b = B {
    data: some_transformation(a.data),
    indexes: unsafe {
        std::mem::transmute(a.indexes)
    },
};

Not sure whether the transmute is safe (I guess yes), but: If you use into_iter, the vec capacity should be reused and at least no reallocation should occur. That's thanks to this pr

4 Likes

You should not take my word for it, but...

Yes, it is UB as written, but you can make it valid by making Handle repr(transparent) and using Vec::from_raw_parts instead of transmute.

Incidentally, I don't think the equivalent would be UB in C++, either... maybe it depends on how you use reinterpret_cast? The main limitation C++ puts on pointer shenanigans that Rust doesn't have is the strict aliasing rule, but there doesn't seem to be any aliasing happening, so I'm not sure why it would be UB. (But I know less about C++ than I do about Rust, so definitely don't take my word for it.)

Incidentally incidentally, be sure that PhantomData<Tag> has the semantics you want (it probably doesn't). The choice of parameter affects variance, drop checking, and auto traits. The most restrictive parameter you can write (according to one definition of "restrictive") is PhantomData<*mut Tag>, so that's not a bad place to start, but that's probably not quite right either. Unfortunately there's no handy PhantomData guide I can link to; perhaps one day I'll get around to writing it.

2 Likes

It's UB in many ways. The A is not guaranteed to have same memory layout as B even if they have same set of fields of same type in same order. And the Vec<T> is not guaranteed to have same memory layout as Vec<U> even if it's empty.

4 Likes

godbolt confirm this, however it should be noted that the loops are not actually gone. Compiler Explorer

1 Like

The code in C++ is in this stackoverflow question.

1 Like

That’s right, I forgot to add #repr(C) or something like this to Handle<Tag>. Would it still be UB?

There isn't a guide to PhantomData, but...

(Ofc, there's also the usual stuff about Send/Sync/other auto traits.)

2 Likes

You can't put #[repr(C)] on the Vec<T>

2 Likes

That’s right. And Vec<T> cannot inherit the repr constraint from its elements. So the only valid thing to do is to use into_iter().map(...).collect(), and fixes the compiler so this would be a no-op.

@Soni Thanks for the link, it’s was very interesting. I knew the words variance, contravariance and covariance, but never understood what it meant. It finally clicked!

1 Like

Since Vec is not #[repr(C)], it is always undefined behavior to transmute from Vec<A> to Vec<B> when A and B are not the exact same type.

5 Likes

Note that the problem with transmute on a Vec pertains specifically to transmute. It is not UB to turn a Vec<T> into a Vec<U> (for "compatible" T and U) using into_raw_parts and from_raw_parts, which is essentially the same thing (in fact, the documentation for mem::transmute specifically mentions this alternative). The only question in this case is whether Handle<A> and Handle<B> are "compatible" for this purpose, which I believe they are when marked repr(transparent).

That said, if into_iter().collect() has apparently the same performance and no unsafe, there would seem to be no reason not to use that instead.

2 Likes

into_iter().collect() isn’t (yet) correctly optimized out (as you can see in @SkiFire13 link).

However using from_raw_parts()/into_raw_parts()` is effectively zero cost. Compiler explorer.

2 Likes

TL,DR

  • Use

    let (ptr, len, cap) = Vec::<Src>::into_raw_parts(vec);
    Vec::<Dst>::from_raw_parts(ptr.cast(), len, cap)
    

    instead of transmute to "transmute" a Vec<Src> into a Vec<Dst> without instant UB; it will even be "safe" provided Src and Dst have the same alloc::Layout (mainly, same alignment) and if all the present instances of Src can "safely" be transmuted into Dst.

  • Thus, also mark Handle as repr(C) or repr(transparent) (and, ideally, go for the latter, since it is more strict), so as to meet those requirements about Src and Dst :slightly_smiling_face:

5 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.