Casting a raw pointer requires size at compile time

Hello,

I came through something strange while using a raw pointer. If I write:

some_pointer.cast<dyn SomeUnsizedTrait>()

I get the following error:

error[E0277]: the size for values of type `dyn SomeUnsizedTrait` cannot be known at compilation 
time

Why does it need to know the size at compile time ? Isn't the "dynamically allocated" part of a pointer is not to be required to know the size of the element it points to at compile time ?

Thank you very much in advance for any help

You are calling the cast method of *const T or *mut T which only works for Sized target types, mostly because the exact requirements for pointer-casts with metadata (basically, types that don’t fulfill T: Sized are (currently) exactly those where pointers to the type carry metadata) isn’t expressible in a function signature.

Metadata for pointers to unsized types is currently either length information (one usize) for pointers to slices, or vtable pointers (so one additional memory address) for pointers to trait objects. Metadata is also a part of raw pointers to unsized types, just like for normal references, or other pointer-like types.

Casting (or coercing) between different pointer types generally either preserves metadata, or it can drop metadata for raw-pointer casts; or it can create new metadata for unsizing coercions. The exact requirements here are checked if you do a cast with the as operator, but don’t have any way of being written down using constraints / trait bounds on a generic function, which is why the cast method more conservatively requires the target type to simply always be Sized. (This restriction might be lifted at some point in the future, who knows…)

Metadata, if it exists (i.e. if the target type isn’t Sized) has to be “valid”[1] even for raw pointers, so there is no general / unsonstrained way to case “any type” to “any other type” including unsized types, even for raw pointers.

If you could tell us what the type of your some_pointer variable actually is, we might be able to give additional comments on your specific use-case :wink:


  1. in particular vtable pointers should point to vtables matching the trait object type ↩︎

2 Likes

Thank you for your great response and explanation.

If you could tell us what the type of your some_pointer variable actually is, we might be able to give additional comments on your specific use-case

My specific type is dyn Fn

You are calling the cast method of *const T or *mut T which only works for Sized target types, mostly because the exact requirements for pointer-casts with metadata (basically, types that don’t fulfill T: Sized are (currently) exactly those where pointers to the type carry metadata) isn’t expressible in a function signature.

I'm sorry but from what I understand why pointers need the Metadata, what it is exactly ? I'm also sorry but I'm not sure why you are talking about "function signature".

Metadata for pointers to unsized types is currently either length information (one usize) for pointers to slices, or vtable pointers (so one additional memory address) for pointers to trait objects. Metadata is also a part of raw pointers to unsized types, just like for normal references, or other pointer-like types.

Isn't the size of a pointer always the same ? Just the size of an address.

Maybe I'm more used to C but in C a pointer is as simple and stupid as an address.

Casting (or coercing) between different pointer types generally either preserves metadata, or it can drop metadata for raw-pointer casts; or it can create new metadata for unsizing coercions. The exact requirements here are checked if you do a cast with the as operator, but don’t have any way of being written down using constraints / trait bounds on a generic function, which is why the cast method more conservatively requires the target type to simply always be Sized. (This restriction might be lifted at some point in the future, who knows…)

Why does a check need to occur since we are working with a simple address that can point anywhere (only known during runtime) ? We are already using something that resembles to "unsafe".

Metadata, if it exists (i.e. if the target type isn’t Sized) has to be “valid” even for raw pointers, so there is no general / unsonstrained way to case “any type” to “any other type” including unsized types, even for raw pointers.

I'm sorry for insisting but I'm really afraid that I have some problem understanding what the "metadata" is. Is the "metadata" part specific to Rust ?

I"m sure that when I'll understand what "metadata" is I'll be able to fully understand.

Thank you for your patience and time.

In the case of dyn Trait, the metadata is a pointer to a vtable which contains pointers to the erased type's Trait methods and destructor.[1] So &dyn Trait (and Box<dyn Trait>) are twice the size of &SizedType (and Box<SizedType>).

Moreover, safe code can assume the metadata of a *const dyn Trait is a valid pointer to a Trait vtable.


  1. and other things like the erased types sized and supertrait methods ↩︎

1 Like

One way to understand the implementation details better can be to look at the assembly output. For example if you put this

pub trait Trait {
    fn method(&self) -> i32;
}

impl Trait for u8 {
    fn method(&self) -> i32 {
        1234
    }
}


pub fn demo<'a>(x: &mut &'a dyn Trait, y: &'a u8) {
    *x = y
}

into the playground and click the “ASM” or “Show Assembly” button (use the 3-dots menu next to “Build” if you need to find it)

playground::demo:
	movq	%rdi, -16(%rsp)
	movq	%rsi, -8(%rsp)
	movq	%rsi, (%rdi)
	leaq	.Lanon.1dea076e61d54acb33ff2a3b5217cc93.0(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

<u8 as playground::Trait>::method:
	movq	%rdi, -8(%rsp)
	movl	$1234, %eax
	retq

.Lanon.1dea076e61d54acb33ff2a3b5217cc93.0:
	.asciz	"\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000"
	.quad	<u8 as playground::Trait>::method
	.byte	1
	.byte	17
	.byte	1
	.byte	37
	.byte	14
	.byte	19
	.byte	5
	.byte	3
	.byte	14
	.byte	16
	.byte	23
	.byte	27
	.byte	14
	.byte	17
	.byte	1
	.byte	85
	.byte	23
	.byte	0
	.byte	0
	.byte	2
	.byte	52
	.byte	0
	.byte	3
	.byte	14
	.byte	73
	.byte	19
	.byte	2
	.byte	24
	.byte	0
	.byte	0
	.byte	3
	.byte	19
	.byte	1
	.byte	29
	.byte	19
	.byte	3
	.byte	14
	.byte	11
	.byte	11
	.ascii	"\210\001"
	.byte	15
	.byte	0
	.byte	0
	.byte	4
	.byte	13
	.byte	0
	.byte	3
	.byte	14
	.byte	73
	.byte	19
	.ascii	"\210\001"
	.byte	15
	.byte	56
	.byte	11
	.byte	0
	.byte	0
	.byte	5
	.byte	15
	.byte	0
	.byte	73
	.byte	19
	.byte	3
	.byte	14
	.byte	51
	.byte	6
	.byte	0
	.byte	0
	.byte	6
	.byte	36
	.byte	0
	.byte	3
	.byte	14
	.byte	62
	.byte	11
	.byte	11
	.byte	11
	.byte	0
	.byte	0
	.byte	7
	.byte	57
	.byte	1
	.byte	3
	.byte	14
	.byte	0
	.byte	0
	.byte	8
	.byte	46
	.byte	1
	.byte	17
	.byte	1
	.byte	18
	.byte	6
	.byte	64
	.byte	24
	.byte	110
	.byte	14
	.byte	3
	.byte	14
	.byte	58
	.byte	11
	.byte	59
	.byte	11
	.byte	63
	.byte	25
	.byte	0
	.byte	0
	.byte	9
	.byte	5
	.byte	0
	.byte	2
	.byte	24
	.byte	3
	.byte	14
	.byte	58
	.byte	11
	.byte	59
	.byte	11
	.byte	73
	.byte	19
	.byte	0
	.byte	0
	.byte	10
	.byte	46
	.byte	1
	.byte	17
	.byte	1
	.byte	18
	.byte	6
	.byte	64
	.byte	24
	.byte	110
	.byte	14
	.byte	3
	.byte	14
	.byte	58
	.byte	11
	.byte	59
	.byte	11
	.byte	73
	.byte	19
	.byte	63
	.byte	25
	.byte	0
	.byte	0
	.byte	11
	.byte	19
	.byte	1
	.byte	3
	.byte	14
	.byte	11
	.byte	11
	.ascii	"\210\001"
	.byte	15
	.byte	0
	.byte	0
	.byte	12
	.byte	15
	.byte	0
	.byte	73
	.byte	19
	.byte	51
	.byte	6
	.byte	0
	.byte	0
	.byte	13
	.byte	19
	.byte	0
	.byte	3
	.byte	14
	.byte	11
	.byte	11
	.ascii	"\210\001"
	.byte	15
	.byte	0
	.byte	0
	.byte	14
	.byte	1
	.byte	1
	.byte	73
	.byte	19
	.byte	0
	.byte	0
	.byte	15
	.byte	33
	.byte	0
	.byte	73
	.byte	19
	.byte	34
	.byte	13
	.byte	55
	.byte	11
	.byte	0
	.byte	0
	.byte	16
	.byte	36
	.byte	0
	.byte	3
	.byte	14
	.byte	11
	.byte	11
	.byte	62
	.byte	11
	.byte	0
	.byte	0
	.byte	0

oh hey that’s a bit lengthy… but we can use “Release” builds to help cut things down… so let’s try that

No symbols detected — they may have been optimized away.
Add the #[unsafe(no_mangle)] attribute to
functions you want to see assembly for. Generic functions
only generate assembly when concrete types are provided.

ah, and it even tells you how to fix that situation, so let’s try it

<u8 as playground::Trait>::method:
	movl	$1234, %eax
	retq

demo:
	movq	%rsi, (%rdi)
	leaq	.Lanon.fe01ce2a7fbac8fafaed7c982a04e229.0(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

.Lanon.fe01ce2a7fbac8fafaed7c982a04e229.0:
	.asciz	"\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000"
	.quad	<u8 as playground::Trait>::method

Nice! So here we see what *x = y does, and hence how a value of type &'a dyn Trait can be produced (and then written to the target of the argument pointer x):

demo:
	movq	%rsi, (%rdi)
	leaq	.Lanon.fe01ce2a7fbac8fafaed7c982a04e229.0(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

The first line, moves %rsi into (%rdi). The latter is the assembly notation for “target of pointer/address”. Here one can infer that %rsi holds our function argument y, and %rdi holds our function argument x :wink:

The interesting part is the next 2 lines

leaq	.Lanon.fe01ce2a7fbac8fafaed7c982a04e229.0(%rip), %rax

just produces a pointer to the vtable into %raw;[1] the vtable itself lives in the static memory as part of the program code, and you can find it right below the function, in the assembly. Then movq %rax, 8(%rdi) moves that vtable pointer into 8(%rdi) which is assembly notation for “dereference %rdi after offsetting by 8 bytes”. So that explains how a &dyn Trait reference is made: it consists of one pointer to the actual target, and right after that one pointer to the vtable.

The same kind of behavior will happen if you use raw-pointer types, e.g. (playground) with this function added

#[unsafe(no_mangle)]
pub unsafe fn demo_raw(x: *mut *const dyn Trait, y: *const u8) {
    unsafe {
        *x = y
    }
}

it shows up in the assembly as follows

<u8 as playground::Trait>::method:
	movl	$1234, %eax
	retq

demo:
	movq	%rsi, (%rdi)
	leaq	.Lanon.2aea8d44369186b67761c05407ddfe3e.0(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

demo_raw:
	movq	%rsi, (%rdi)
	leaq	.Lanon.2aea8d44369186b67761c05407ddfe3e.0(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

.Lanon.2aea8d44369186b67761c05407ddfe3e.0:
	.asciz	"\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000"
	.quad	<u8 as playground::Trait>::method

In case you’re wondering about this "\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000" part – that’s represented a bit weird in the assembly, but it’s generally consisting of

  • drop code for the type
  • type’s size
  • type’s alignment

here rendered though as a “string-literal” (which is additionally confusing as it’s zero-terminated, so you’ll only see 7 bytes in the trailing \001\000\000\000\000\000\000) and that these values are little-endian.

You can see this by changing the type, e.g. with

#[unsafe(no_mangle)]
pub fn demo_string<'a>(x: &mut &'a dyn Trait, y: &'a String) {
    *x = y
}

(playground) we can get

demo_string:
	movq	%rsi, (%rdi)
	leaq	.Lanon.1d4a3cc080dab32f19d796fbe22cc522.0(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

.Lanon.1d4a3cc080dab32f19d796fbe22cc522.0:
	.quad	core::ptr::drop_in_place<alloc::string::String>
	.asciz	"\030\000\000\000\000\000\000\000\b\000\000\000\000\000\000"
	.quad	<alloc::string::String as playground::Trait>::method

where \b is “backspace” (ASCII encoded as 8) i.e. the alignment of a String, 8 (on a 64-bit system); and \030 is featuring an octal number 030 which is the number 3 * 8 = 24, the size of a String (which consists of a pointer, a length, and a capacity; 3 word-sized values).

If you want to read this as rust code instead, the current vtable implementation details could be simulated as follows (note that this relies on implementation details and is not a sound way to create vtables, and can break in the future).

#[repr(C)]
struct TraitVtable<T> {
    drop: Option<unsafe fn(*mut T)>,
    size: usize,
    align: usize,
    method: fn(&T) -> i32,
}

static VTABLE_TRAIT_FOR_U8: TraitVtable<u8> = TraitVtable {
    drop: None,
    size: std::mem::size_of::<u8>(),
    align: std::mem::align_of::<u8>(),
    method: <u8 as Trait>::method,
};

#[unsafe(no_mangle)]
pub fn demo_manually<'a>(x: *mut *const dyn Trait, y: *const u8) {
    unsafe {
        let x: *mut u8 = x.cast();
        x.cast::<*const u8>().write(y);
        x.offset(8).cast::<&'static TraitVtable<u8>>().write(&VTABLE_TRAIT_FOR_U8);
    }
}

(playground)

<u8 as playground::Trait>::method:
	movl	$1234, %eax
	retq

demo_manually:
	movq	%rsi, (%rdi)
	leaq	playground::VTABLE_TRAIT_FOR_U8(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

It doesn’t show the contents of playground::VTABLE_TRAIT_FOR_U8 here in the filtered assembly view of the playground unfortunately (I think on godbolt.org, you might get a better output), but if you uncomment the original demo next to it, you can get this result

<u8 as playground::Trait>::method:
	movl	$1234, %eax
	retq

demo_manually:
	movq	%rsi, (%rdi)
	leaq	.Lanon.b7685d77080b46e47a9a83f0de6d6703.0(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

demo_u8:
	movq	%rsi, (%rdi)
	leaq	.Lanon.b7685d77080b46e47a9a83f0de6d6703.0(%rip), %rax
	movq	%rax, 8(%rdi)
	retq

.Lanon.b7685d77080b46e47a9a83f0de6d6703.0:
	.asciz	"\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000\001\000\000\000\000\000\000"
	.quad	<u8 as playground::Trait>::method will happe

where LLVM actually managed to get rid of the second “copy” of this static data and unified the real vtable as .Lanon.b7685d77080b46e47a9a83f0de6d6703.0 with our hand-baked one, hence confirming it was accurate. With String in place of u8 it would work as well (==>> see this playground).


  1. this is involving %rip, the instruction pointer, because these kinds of addresses are produced – assisted by the assembler – as relative addresses, as offsets from the current instruction’s address, so the program code can work no matter where exactly into memory it has been loaded ↩︎

5 Likes

Thank you for your great answer.

It really helped me understand how it worked under the hood.

So if I understand well, it's to be able to make sure that the vtable is known at compile time and run time to obtain a safe cast ?

Thank you for your time and help