Newtype problem: reinterpret_cast a &[Bar] to &[Foo]

I'm using the feature "Newtype" to follow the orphan rules, but I wondered if I can make my new type totally the same as the old one, for example:

struct Foo;

#[derive(Deref)]
struct Bar(Foo);

fn foos(foos: &[Foo]) {

}

fn main() {
    let bars = vec!{Bar{0: Foo{}}, Bar{0: Foo{}}, Bar{0: Foo{}}};
    foos(bars.as_slice());
}

In this case, I have an API foos which takes &[Foo], but I only have a Vec<Bar>. To call the API, I have two choose:

  1. create a Vec<Foo> from Vec<Bar>, but this will make a useless copy, if the memory layout of Foo is the same with Bar.
  2. use unsafe codes to force cast a Vec<Bar> to Vec<Foo>.

So my question is:

  1. Is the memory layout of Bar and Foo are the same? (in the real code, I even add Repr(C) to try to make the memory layout the same, but I'm not sure if it is useful). If it's not, how do I make them the same layout?

  2. If I finally make the layouts the same, any safe way to turn a Vec<Bar> to a Vec<Foo>?

You can use #[repr(transparent)] to guarantee this.

I believe this would make something like this sound:

let original: Vec<Foo> = vec![];
let bar_vec = unsafe {
    // Ensure the original vector is not dropped.
    let mut original = std::mem::ManuallyDrop::new(original);
    Vec::from_raw_parts(
        original.as_mut_ptr() as *mut Bar,
        original.len(),
        original.capacity(),
    )
};

(example adapted from code in one of the transmute doc examples)

4 Likes

It is not guaranteed if the struct is repr(Rust) (this is the default). If your struct is repr(C) and contains only a single field, then its layout is identical to that of the single field. If this is your intent, you can use repr(transparent) to explicitly specify that the layout is identical to that of the singular non-zero-sized field. (repr(transparent) also has ABI implications that repr(C) doesn't.)

Yes, with some caveats (assuming the types have identical layout).

If you have &[Bar] and want &[Foo], you can pointer cast between the two types. This is unsafe, as the compiler can't check this for you, but it is sound, as arrays/slices have defined layout. This is &*(slice as *const [Bar] as *const [Foo]).

transmuteing between Vec<Bar> and Vec<Foo>, however, is unsound. This is because Vec is itself repr(Rust), and that means that its field layout is not guaranteed. The field order could differ between the two types, in which case you have UB.

transmute::<&[Bar], &[Foo]> is also technically[1] unsound, for the same reason: the data pointer and length are not restricted to being in the same order.

However, you can still get from Vec<Bar> to Vec<Foo> soundly. You have to use Vec::into_raw_parts to split Vec into its component (ptr, len, cap) triple, cast the pointer from *mut Bar to *mut Foo, and then call Vec::from_raw_parts to reconstitute the Vec<Foo>. This is explicitly documented as allowed by the documentation of these methods, so long as Bar and Foo have the same size and alignment.

If you just need a slice reference, though, I'd recommend just casting that, rather than the entire vector. The ref-cast crate automates and makes safe the single-element case, but I don't know of a crate which also handles the slice transmute case.


  1. The Vec transmute case has a practical chance of breaking on some future rustc version. The slice transmute is probably de-facto stable due to people doing this transmute and there being no real reason to have different orders between different slice types, but you still shouldn't rely on this. ↩︎

10 Likes

@CAD97 @daboross thanks for helping, what if my Bar is a little complicated:

#[repr(C)]
#[derive(Deref)]
pub(crate) struct Bar<'a> {
    #[deref]
    foo: Foo,
    phantom_data: PhantomData<&'a BarBuilder>,
}

In this case, I add a PhantomData as a field, according to my knowledge, this field does not change the memory layout at all, so can I still replace repr(C) with repr(transparent) to make sure the layouts are the same?

If #[repr(transparent)] compiles, it will have the same layout. (PhantomData is a zero sized type with no extra alignment, so it doesn't impact layout.)

5 Likes

And how bad is that to cast between slices? I don't really understand the explanation :rofl:, it looks to me a problem in linkage, and if my codes are all compiled from source at once(do not link any prebuild libraries), it is fine, right?

Casting between slices with &*(slice as *const [Bar] as *const [Foo]) is always ok if you mark Bar with #[repr(transparent)]. Casting the slices with transmute is always sketchy. How your code is linked is irrelevant.

7 Likes

Do you mean the newtype pattern, or an (unstable) feature of Nightly Rust?

If you need/want it, you should probably limit the usage to a single occurrence (a single associated function where you use unsafe, such that you don't need unsafe in your remaining code, like this:

use std::mem::transmute;
use std::ops::Deref;

struct Foo;

#[repr(transparent)]
struct Bar(Foo);

impl Deref for Bar {
    type Target = Foo;
    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

impl Bar {
    fn deref_slice(slice: &[Bar]) -> &[Foo] {
        unsafe {
            transmute::<&[Bar], &[Foo]>(slice)
            // or maybe better:
            //std::slice::from_raw_parts((slice as *const [Bar]).cast(), slice.len())
        }
    }
}

fn foos(foos: &[Foo]) {
    #![allow(unused_variables)]
}

fn main() {
    let bars = vec!{Bar{0: Foo{}}, Bar{0: Foo{}}, Bar{0: Foo{}}};
    foos(Bar::deref_slice(&bars));
}

(Playground)

Not sure if an associated function is the best. You might also use an extension trait on the particular slice type, but also see the comments on my linked post. Another alternative would be to use an ordinary free standing function (that internally uses unsafe) to perform the transmutation in your case.

That is interesting… That would mean the following would be better?

std::slice::from_raw_parts((slice as *const [Bar]).cast(), slice.len())

Or did I make a mistake there?

I rely on transmuting references in my own code, i.e. doing things like transmute::<&[Bar], &[Foo]>(slice). But I'm doing so because apparently std does the same thing, e.g. in str::as_bytes:

#[lang = "str"]
#[cfg(not(test))]
impl str {
    /* … */
    #[stable(feature = "rust1", since = "1.0.0")]
    #[rustc_const_stable(feature = "str_as_bytes", since = "1.39.0")]
    #[must_use]
    #[inline(always)]
    #[allow(unused_attributes)]
    pub const fn as_bytes(&self) -> &[u8] {
        // SAFETY: const sound because we transmute two types with the same layout
        unsafe { mem::transmute(self) }
    }
    /* … */

P.S.: I just noticed maybe it's not the same as here as &str gets transmuted to &[u8], and not &[some_type] to &[some_other_type].

@jbe The standard library can rely on implementation details such as how this specific version of the compiler happens to lay out slices, since the standard library is shipped together with the compiler. We can't rely on those things unless Rust guarantees that they wont change them in later releases.

So the transmute you posted is not guaranteed to be correct. The shortest way to do the cast is

&*(slice as *const [Bar] as *const [Foo])

but using slice::from_raw_parts is also ok.

4 Likes

I'm doing it in this code in mmtkvdb.

I see!

So I should probably modify the code like this:

 /// Implement [`Storable`] for variable-sized types that do not require
 /// alignment and can be simply transmuted
 macro_rules! impl_storable_transmute_varsize_trivial_cmp {
     ($type:ty, $owned:ty) => {
         unsafe impl Storable for $type {
             const CONST_BYTES_LEN: bool = false;
             const TRIVIAL_CMP: bool = true;
             type AlignedRef<'a> = &'a Self;
             type BytesRef<'a> = &'a [u8];
             fn to_bytes(&self) -> Self::BytesRef<'_> {
-                unsafe { transmute::<&Self, &[u8]>(self) }
+                let ptr = self as *const Self as *const [u8];
+                unsafe { &*ptr }
             }
             unsafe fn from_bytes_unchecked(bytes: &[u8]) -> Self::AlignedRef<'_> {
-                transmute::<&[u8], &Self>(bytes)
+                &*(bytes as *const [u8] as *const Self)
             }
         }
         unsafe impl Storable for $owned {
             const CONST_BYTES_LEN: bool = false;
             const TRIVIAL_CMP: bool = true;
             type AlignedRef<'a> = Owned<Self>;
             type BytesRef<'a> = &'a [u8];
             fn to_bytes(&self) -> Self::BytesRef<'_> {
-                unsafe { transmute::<&$type, &[u8]>(&self) }
+                let ptr = self as &$type as *const $type as *const [u8];
+                unsafe { &*ptr }
             }
             unsafe fn from_bytes_unchecked(bytes: &[u8]) -> Self::AlignedRef<'_> {
-                Owned(transmute::<&[u8], &$type>(bytes).to_owned())
+                let ptr = bytes as *const [u8] as *const $type;
+                Owned((&*ptr).to_owned())
             }
         }
     };
 }
 
 impl_storable_transmute_varsize_trivial_cmp!([bool], Vec<bool>);
 impl_storable_transmute_varsize_trivial_cmp!([i8], Vec<i8>);
 impl_storable_transmute_varsize_trivial_cmp!([u8], Vec<u8>);
 impl_storable_transmute_varsize_trivial_cmp!(str, String);

I hope that is sound then?

P.S.: Note that size_of::<bool>() is meanwhile guaranteed to be 1. However, I'm not sure I can rely on the metadata of str and [u8] being compatible. The wide-pointer stores the length in "number of elements" normally, right? But I'm not sure I can rely on str storing the length in bytes rather than any other unit :frowning:


In other words, is the following really technically sound (assuming it's done in user code and not within std)?

fn as_bytes(s: &str) -> &[u8] {
    let ptr = s as *const str as *const [u8];
    unsafe { &*ptr }
}

fn main() {
    let s = "ABC";
    let v = as_bytes(s);
    println!("{v:?}");
}

(Playground)

Output:

[65, 66, 67]

Not sure. However, you could write the impl directly and call the str::as_bytes or str::from_utf8_unchecked conversion methods which definitely do it correctly.

1 Like

Maybe that's (in theory) the best solution, but it would bloat up my code even more. I'll consider doing it though or maybe find another solution.

But this got me startled. I checked the Rust reference on "Type Cast Expressions" and I don't find any explanation how type casts of wide-pointers work.

Consider the following example:

fn trns(slice: &[u16]) -> &[u8] {
    let ptr = slice as *const [u16] as *const [u8];
    unsafe { &*ptr }
}

fn main() {
    let x = [1, 2, 3, 4, 5];
    let y = trns(&x);
    println!("{y:?}");
}

(Playground)

Output:

[1, 0, 2, 0, 3]

The output is, of course, dependent on endianess, and the example doesn't make much sense, but it should be sound because size_of::<u16>() is never smaller than size_of::<u8>().

But is this example really guranteed to not be UB?

What if a future version of Rust stores the beginning and end-address for *const [u8] but stores beginning and length for *const [u16]? Then the above example would crash, unless we have some guarantees on what as *const [u16] as *const [u8] really does (e.g. doing the conversion for us).

I didn't not find any explanation on how casting pointers really works in the case of slices, so I would like to state the hypothesis that maybe this is "technically" same unsound as using transmute on references. :exploding_head: (But maybe it is defined somewhere?)


To get back to the OP, …

… is there really a gurantee that if Bar is #[repr(transparent)], we can safely do as *const [Bar] as *const [Foo]? And if so, why?

I would assume we can rely on [Bar] and [Foo] having the same memory representation (but not even 100% sure on that), but does this imply that *const [Bar] and *const [Foo] have the same memory representation too? And even if that's the case, where is defined what the cast *const [Bar] as *const [Foo] really does?

See also Rust's reference on its memory model. :grin:


If the hypothesis that this isn't well-defined holds, then the OP should use std::slice::from_raw_parts:

And getting back to my transmutation, probably the right fix would be:

 /// Implement [`Storable`] for variable-sized types that do not require
 /// alignment and can be simply transmuted
 macro_rules! impl_storable_transmute_varsize_trivial_cmp {
-    ($type:ty, $owned:ty) => {
+    ($elem:ty, $type:ty, $owned:ty) => {
         unsafe impl Storable for $type {
             const CONST_BYTES_LEN: bool = false;
             const TRIVIAL_CMP: bool = true;
             type AlignedRef<'a> = &'a Self;
             type BytesRef<'a> = &'a [u8];
             fn to_bytes(&self) -> Self::BytesRef<'_> {
-                unsafe { transmute::<&Self, &[u8]>(self) }
+                unsafe {
+                    slice::from_raw_parts(self as *const Self as *const u8, size_of_val(self))
+                }
             }
             unsafe fn from_bytes_unchecked(bytes: &[u8]) -> Self::AlignedRef<'_> {
-                transmute::<&[u8], &Self>(bytes)
+                slice::from_raw_parts(
+                    bytes as *const [u8] as *const $elem,
+                    bytes.len() / size_of::<$elem>(),
+                )
             }
         }
         unsafe impl Storable for $owned {
             const CONST_BYTES_LEN: bool = false;
             const TRIVIAL_CMP: bool = true;
             type AlignedRef<'a> = Owned<Self>;
             type BytesRef<'a> = &'a [u8];
             fn to_bytes(&self) -> Self::BytesRef<'_> {
-                unsafe { transmute::<&$type, &[u8]>(&self) }
+                let slice: &$type = self;
+                unsafe {
+                    slice::from_raw_parts(slice as *const $type as *const u8, size_of_val(slice))
+                }
             }
             unsafe fn from_bytes_unchecked(bytes: &[u8]) -> Self::AlignedRef<'_> {
-                Owned(transmute::<&[u8], &$type>(bytes).to_owned())
+                Owned(
+                    slice::from_raw_parts(
+                        bytes as *const [u8] as *const $elem,
+                        bytes.len() / size_of::<$elem>(),
+                    )
+                    .to_owned(),
+                )
             }
         }
     };
 }
 
-impl_storable_transmute_varsize_trivial_cmp!([bool], Vec<bool>);
-impl_storable_transmute_varsize_trivial_cmp!([i8], Vec<i8>);
-impl_storable_transmute_varsize_trivial_cmp!([u8], Vec<u8>);
-impl_storable_transmute_varsize_trivial_cmp!(str, String);
+impl_storable_transmute_varsize_trivial_cmp!(bool, [bool], Vec<bool>);
+impl_storable_transmute_varsize_trivial_cmp!(i8, [i8], Vec<i8>);
+impl_storable_transmute_varsize_trivial_cmp!(u8, [u8], Vec<u8>);
+//impl_storable_transmute_varsize_trivial_cmp!(/* what to put here? */, str, String); // this breaks now!

edit: I added / size_of::<$elem>() after bytes.len()

Notice how the last macro call breaks now, which would force me to do what @alice proposed:

1 Like

I started a new thread where I'm asking about how wide-pointer casts are defined.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.