Re-interpreting a Repr-C'd struct's bytes as a certain type

suppose within a #[repr(C)] struct, I have a pointer returned to me by the global allocator with an align of, say, 8 bytes. I want the user to input a type param T, and so long as the align of T is equal to those 8 bytes, I will return a mutable reference to the structure represented by those 8 bytes.

This doesn't work:

    unsafe fn cast_unchecked_mut<Type>(&self) -> &mut Type {
        std::ptr::read_volatile::<&mut Type>(self.ptr as *const &mut Type)
    }

nor

    unsafe fn cast_unchecked_mut<Type>(&self) -> &mut Type {
        std::ptr::read::<&mut Type>(self.ptr as *const &mut Type)
    }

I would also like to mention that self.ptr is of type *mut u8

1 Like

First, you should be using &mut self to avoid UB. Next, you could just cast the pointer and dereference it

unsafe fn cast_unchecked_mut<Type>(&mut self) -> &mut Type {
    assert!(std::mem::align_of::<Type>() <= 8);
    &mut *(self.ptr as *mut Type)
}
3 Likes

I have mechanisms behind the scene controlling the access order, so it's actually good on the UB there (unless there's something weird?). I will check this new way out now! thanks

Going from a &_ to a &mut _ is UB unless you go from a &UnsafeCell<T> to a &mut T, even then that can be UB if done incorrectly. No matter how you do it.

1 Like

Really? How so? I have atomically-controlled usage of the get_unchecked_mut, so i'd figure giving it a mutable through an immutable is okay?

using AtomicUsize, as an example

AtomicUsize uses UnsafeCell internally.

(lots of macro stuff to build all atomics, but they are defined like so)

#[$stable]
#[repr(C, align($align))]
pub struct $atomic_type {
    v: UnsafeCell<$int_type>,
}

Same with Cell, RefCell, Mutex, RwLock and many more

1 Like

Okay, great. Problem solved. For extended dicussion, the below compiles. If I control when rip_mut is called, would this still produce UB? would there be bad alignment or mis-allocation when the data structure changes?

    /// converts immutable self to mutable self. This is retardedly unsafe and I'm not special enough to do it, according to Rust
    unsafe fn rip_mut(&self) -> *mut Self {
        (&*self as *const Self) as *mut Self
    }

That is not UB, it would be UB to write to that pointer though. (no &mut _ was made, only a *mut _ which is not speical)

&mut _ is special because it means unique, and going from a shared reference to a unique reference requires black magic (a lang item, UnsafeCell) to do.

3 Likes

(*rip_mut()).my_usize_field = 10

That would be UB if it was access-controlled?

(Access-controlled via AtomicUsize)

Yes, that would write to a ptr derived from a shared reference, without going through a UnsafeCell, so it UB.

Btw, what do you mean access controlled by a AtomicUsize?

1 Like

Ah, this has to do with the project we were talking about in our last thread that you helped me in!

Asynchronous data editing :slight_smile:

i make a WriteVisitor that Poll's ready once its "ticket" number equals the AtomicUsize. It guarantees that only one WriteVisitor has access to writing at once

Could you please link to it? I don't remember which one.

1 Like
1 Like

Yeah, this sort of thing would be easiest to build using UnsafeCell directly. Then control access to that UnsafeCell with whatever means that you want. Be very careful to make sure that all &mut _ uniquely referer to what they point to.

2 Likes

UnsafeCell looks nice to me. My question from using that: would the common optimizations that the compiler lends towards knowing that safe code treats mutability as unique go away? Would many optimizations go away?

1 Like

It would remove any optimization that treats &_ as meaning immutable for the value wrapped in UnsafeCell<_>.

for example,

pub fn foo(x: &UnsafeCell<u32>) {
    unsafe {
        let a = *x.get();
        std::thread::yield_now();
        let b = *x.get();
        assert_eq!(a, b);
    }
}

generates

long assembly with panic handling
&T as core::fmt::Debug>::fmt:
	pushq	%r14
	pushq	%rbx
	pushq	%rax
	movq	%rsi, %rbx
	movq	(%rdi), %r14
	movq	%rsi, %rdi
	callq	*core::fmt::Formatter::debug_lower_hex@GOTPCREL(%rip)
	testb	%al, %al
	je	.LBB0_1
	movq	%r14, %rdi
	movq	%rbx, %rsi
	addq	$8, %rsp
	popq	%rbx
	popq	%r14
	jmpq	*core::fmt::num::<impl core::fmt::LowerHex for u32>::fmt@GOTPCREL(%rip)

.LBB0_1:
	movq	%rbx, %rdi
	callq	*core::fmt::Formatter::debug_upper_hex@GOTPCREL(%rip)
	movq	%r14, %rdi
	movq	%rbx, %rsi
	addq	$8, %rsp
	testb	%al, %al
	je	.LBB0_2
	popq	%rbx
	popq	%r14
	jmpq	*core::fmt::num::<impl core::fmt::UpperHex for u32>::fmt@GOTPCREL(%rip)

.LBB0_2:
	popq	%rbx
	popq	%r14
	jmpq	*core::fmt::num::imp::<impl core::fmt::Display for u32>::fmt@GOTPCREL(%rip)

playground::foo:
	pushq	%rbp
	pushq	%rbx
	subq	$104, %rsp
	movq	%rdi, %rbx
	movl	(%rdi), %ebp
	movl	%ebp, (%rsp)
	callq	*std::thread::yield_now@GOTPCREL(%rip)
	movl	(%rbx), %eax
	movl	%eax, 4(%rsp)
	cmpl	%eax, %ebp
	jne	.LBB1_1
	addq	$104, %rsp
	popq	%rbx
	popq	%rbp
	retq

.LBB1_1:
	movq	%rsp, %rax
	movq	%rax, 8(%rsp)
	leaq	4(%rsp), %rax
	movq	%rax, 16(%rsp)
	leaq	8(%rsp), %rax
	movq	%rax, 24(%rsp)
	leaq	<&T as core::fmt::Debug>::fmt(%rip), %rax
	movq	%rax, 32(%rsp)
	leaq	16(%rsp), %rcx
	movq	%rcx, 40(%rsp)
	movq	%rax, 48(%rsp)
	leaq	.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.3(%rip), %rax
	movq	%rax, 56(%rsp)
	movq	$3, 64(%rsp)
	movq	$0, 72(%rsp)
	leaq	24(%rsp), %rax
	movq	%rax, 88(%rsp)
	movq	$2, 96(%rsp)
	leaq	.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.5(%rip), %rsi
	leaq	56(%rsp), %rdi
	callq	*std::panicking::begin_panic_fmt@GOTPCREL(%rip)
	ud2

.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.0:
	.ascii	"assertion failed: `(left == right)`\n  left: `"

.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.1:
	.ascii	"`,\n right: `"

.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.2:
	.byte	96

.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.3:
	.quad	.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.0
	.asciz	"-\000\000\000\000\000\000"
	.quad	.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.1
	.asciz	"\f\000\000\000\000\000\000"
	.quad	.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.2
	.asciz	"\001\000\000\000\000\000\000"

.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.4:
	.ascii	"src/lib.rs"

.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.5:
	.quad	.Lanon.ec0bd99b1b4051f7d320bc576b3d0a71.4
	.asciz	"\n\000\000\000\000\000\000\000\b\000\000\000\t\000\000"

while

pub fn bar(x: &u32) {
    let a = *x;
    std::thread::yield_now();
    let b = *x;
    assert_eq!(a, b);
}

generates almost nothing (completely optimized away the assert)

playground::bar:
	jmpq	*std::thread::yield_now@GOTPCREL(%rip)

On playground

Note, that the only thing between the two gets is a thread yield!

This behavior is correct and wanted. You don't want these optimizations when you have shared mutability, as they will lead to subtle and infuriating bugs.

5 Likes

Very interesting... so, let me see if I understand this. both a and b, which are stored in different layouts in memory, happen to have the value which is the pointer towards x.