Is there an optimization difference between UnsafeCell<T> and *mut T? When should they be used, and what should be kept in mind?

Do both UnsafeCell and *mut T reach the same optimization level, or does *mut T offer higher optimization than UnsafeCell?

When should I use UnsafeCell and *mut T? My current understanding is that UnsafeCell allows mutating a value via &self, whereas *mut T requires &mut self. What else is there?

Furthermore, what are the boundaries or constraints I must observe to avoid undefined behavior?

UnsafeCell<T> owns a value of type T whereas *mut T is a raw pointer to a value of T. So, no, in general they will not be optimized in the same way, because one thing is an owned value and the other one a pointer, i.e. a memory address (with provenance).

1 Like

Could you provide more detail regarding which optimizations differ and what their effects are? Which one is optimized better? Or more specifically, how do their optimization profiles impact the final machine code?

and metadata.
this is nitpicky, and i apologize for that, but all pointers carry metadata as defined by the Pointee trait. this is too often ignored which makes people confused when they see types who's metadata isn't ()
not sure if it makes sense to consider provenance to be part of the metadata as strictly speaking it is "metadata"

1 Like

as Schard said, these are not comparable. one of them is a value, while the other is a pointer.
its like asking, "which is heavier, a human, or water", there is no logical anwser.

we can tell you *mut T is less optimized than &T, and &mut T, as they are all pointers, which thus do the same job.
and we can tell you that UnsafeCell<T> might be less optimized than a raw T in some places, because these are both values, which serve the same job.
although if you have a UnsafeCell<T> or T by value, it shouldn't mater. it only really matters when behind some kinde of reference

What would happen in a use case like this?

use std::cell::UnsafeCell;
use std::alloc::{alloc, dealloc, Layout, handle_alloc_error};
use std::ptr;

pub struct UnsafeCellBox<T, const N: usize> {
    value: UnsafeCell<Box<([T; N], usize)>>,
}

impl<T: Copy, const N: usize> UnsafeCellBox<T, N> {
    pub fn new(val: [T; N]) -> Self {
        Self {
            value: UnsafeCell::new(Box::new((val, N))),
        }
    }

    pub fn push(&self, val: T) {
        unsafe {
            let inner = &mut *self.value.get();
            if inner.1 < N {
                inner.0[inner.1] = val;
                inner.1 += 1;
            }
        }
    }

    pub fn get(&self) -> &[T; N] {
        unsafe {
            &(*self.value.get()).0
        }
    }
}

pub struct RawPointer<T, const N: usize> {
    ptr: *mut ([T; N], usize),
}

impl<T: Copy, const N: usize> RawPointer<T, N> {
    pub fn new(val: [T; N]) -> Self {
        let layout = Layout::new::<([T; N], usize)>();
        unsafe {
            let ptr = alloc(layout) as *mut ([T; N], usize);
            if ptr.is_null() {
                handle_alloc_error(layout);
            }
            ptr::write(ptr, (val, N));
            Self { ptr }
        }
    }

    pub fn push(&mut self, val: T) {
        unsafe {
            if (*self.ptr).1 < N {
                (*self.ptr).0[(*self.ptr).1] = val;
                (*self.ptr).1 += 1;
            }
        }
    }

    pub fn get(&self) -> &[T; N] {
        unsafe {
            &(*self.ptr).0
        }
    }
}

impl<T, const N: usize> Drop for RawPointer<T, N> {
    fn drop(&mut self) {
        unsafe {
            ptr::drop_in_place(self.ptr);
            dealloc(self.ptr as *mut u8, Layout::new::<([T; N], usize)>());
        }
    }
}

pub struct BoxStruct<T, const N: usize> {
    inner: Box<([T; N], usize)>,
}

impl<T: Copy, const N: usize> BoxStruct<T, N> {
    pub fn new(val: [T; N]) -> Self {
        Self {
            inner: Box::new((val, N)),
        }
    }

    pub fn push(&mut self, val: T) {
        if self.inner.1 < N {
            self.inner.0[self.inner.1] = val;
            self.inner.1 += 1;
        }
    }

    pub fn get(&self) -> &[T; N] {
        &self.inner.0
    }
}

pub struct BoxUnsafeCell<T, const N: usize> {
    inner: Box<UnsafeCell<([T; N], usize)>>,
}

impl<T: Copy, const N: usize> BoxUnsafeCell<T, N> {
    pub fn new(val: [T; N]) -> Self {
        Self {
            inner: Box::new(UnsafeCell::new((val, N))),
        }
    }

    pub fn push(&self, val: T) {
        unsafe {
            let inner = &mut *self.inner.get();
            if inner.1 < N {
                inner.0[inner.1] = val;
                inner.1 += 1;
            }
        }
    }

    pub fn get(&self) -> &[T; N] {
        unsafe {
            &(*self.inner.get()).0
        }
    }
}

fn main() {
    let initial = [1, 2, 0, 0];

    let uc_box = UnsafeCellBox::<i32, 4>::new(initial);
    uc_box.push(3);
    println!("UnsafeCellBox: {:?}", uc_box.get());

    let mut rp = RawPointer::<i32, 4>::new(initial);
    rp.push(3);
    println!("RawPointer:    {:?}", rp.get());

    let mut b = BoxStruct::<i32, 4>::new(initial);
    b.push(3);
    println!("BoxStruct:     {:?}", b.get());

    let buc = BoxUnsafeCell::<i32, 4>::new(initial);
    buc.push(3);
    println!("BoxUnsafeCell: {:?}", buc.get());
}

Both seem equivalent.

That unsafe cell allows you to change the pointer itself, but it has no effect on the mutability contents of the array.

1 Like

How about these case?

  • UnsafeCell<Box<T>>

  • *mut T

  • Box<T>

  • Box<UnsafeCell<T>>

As in the latest code case in the comment

When should I use which of these 4?

And what needs to be considered regarding their safety?

Your question is hard to answer because there are many different details, and any summary would be possibly misleading.

I would suggest simplifying it by imagining corresponding safe types. UnsafeCell<Box<T>> is like RwLock<Box<T>>, and Box<UnsafeCell<T>> is like Box<RwLock<T>>. Of course, you can do a few more things with the UnsafeCell, but those additional things are more like specific access patterns rather than fundamental differences in the amount of mutability and aliasability present.

1 Like

BTW, this would be unsound (and UB), because you're allowing &self calls create arbitrary number of exclusive &mut Box or &mut […] slices.

UnsafeCell is not recursive, and it still requires you to prevent data races and ensure there can't be aliased &mut ever.

1 Like

I think this can be generalised as UnsafeCell<T> giving you memory location to write to without &mut's restrictions, and *mut T gives you ability to do the write there.

*mut T pointing to a regular Rust object (not inside the cell) still has to obey all the rules of Rust's mutability and aliasing.

1 Like

Is it like this?

Box<UnsafeCell> : Box is a pointer that guarantees only one active mutable pointer exists. It points to an UnsafeCell, which provides the ability to mutate data without &mut, allowing edits through &self. However, aliasing rules still apply: if an active &self exists and I use another &self to mutate the data, it results in Undefined Behavior.

UnsafeCell<Box> provides the ability to edit the Box pointer itself through &self, such as replacing it with a new Box. However, it also allows editing T. The aliasing rules are the same as above.

Box is a pointer to T. It can edit T but only through &mut. Consequently, the aliasing rules are the same as the two above, but the difference is that in this version, errors are caught at compile time. In the two above, the violation result to Undefined Behavior (at runtime/Miri).

Now I am confused about when to use Box + UnsafeCell if Box alone can already edit T. Since both must follow aliasing rules, the difference being that Box triggers a compile error while UnsafeCell results in Undefined Behavior, what to use and when to use these two?

My exercise was like this :

use std::cell::UnsafeCell;

struct Boxed {
    val: Box<[i32; 10]>
}
impl Boxed {
    fn edit(&mut self) {
        self.val[0] = 20;
    }
    fn reff(&self) -> &i32 {
        &self.val[0]
    }
}

struct BoxCell {
    val: Box<UnsafeCell<[i32; 10]>>
}
impl BoxCell {
    fn edit(&self) {
        unsafe {
            (*self.val.get())[0] = 20;
        }
    }
    fn reff(&self) -> &[i32] {
        unsafe { &(*self.val.get()) }
    }
}

struct CellBox {
    val: UnsafeCell<Box<[i32; 10]>>
}
impl CellBox {
    fn edit(&self) {
        unsafe {
            (*self.val.get())[0] = 20;
        }
    }
    fn reff(&self) -> &i32 {
        unsafe { &(*self.val.get())[0] }
    }
}

fn main() {

    let mut boxed = Boxed { val: Box::new([0; 10]) };
    let a = boxed.reff();
    //boxed.edit();
    println!("{}", a);
    
    let boxcell = BoxCell {
        val: Box::new(UnsafeCell::new([0; 10]))
    };
    let a = boxcell.reff();
    boxcell.edit();
   // println!("{:?}", a);
    
    let cellbox = CellBox {
        val: UnsafeCell::new(Box::new([0; 10]))
    };
    let a = cellbox.reff();
    cellbox.edit();
    //println!("{}", a);
    
}

Yeah, I just learned about that earlier :v

If you have a Box<UnsafeCell<T>>, it does allow you to do more than a Box<T>. Due to Box’s uniqueness rule, you can only mutate the T inside a Box<T> while you have, or could have, exclusively borrowed the Box<T> — it’s an aliasing violation to use an *mut T to mutate the T while the Box<T> is immutably borrowed. On the other hand, if you have Box<UnsafeCell<T>>, then you can get an *mut T from the UnsafeCell and using it is not a violation of Box’s rules unless the Box is exclusively borrowed, because UnsafeCell<T> opts out of the T contents being subject to aliasing rules from its owner (except when the UnsafeCell is exclusively borrowed).

However, most of the use cases of UnsafeCell are when you are defining a data type that allows mutation through &self and does not contain any pointers. For example, std::cell::Cell and std::sync::AtomicU32 are both implemented by containing an UnsafeCell, and neither one contains any pointers. In this case, the point isn’t some combination of Box and UnsafeCell, but the fact that the type can be mutated regardless of whether or not it is owned through a pointer or not.

If you do have a pointer as part of your type, then UnsafeCell is no longer necessary. Still, there are differences between Box<UnsafeCell<T>> and *mut T:

  • If you have &mut Box<UnsafeCell<T>> then you can safely get an &mut T. This is not true for *mut T.
  • When Box<UnsafeCell<T>> is dropped, it drops the T and frees the heap allocation. This is not true for *mut T.

These are both true because of the fact that having the exclusive borrow &mut Box<UnsafeCell<T>> implies that you can obtain the exclusive borrow &mut T, whereas having &mut *mut T does not imply such permission (nor does it forbitd it). *mut T is “do what you want” and Box<UnsafeCell<T>> is not, so Box<UnsafeCell<T>> can make more assumptions and do more for you automatically, but some of those assumptions might be wrong. (For example, if you wanted to make your own version of Rc<T>, you would have to use Rc<T> because Box’s uniqueness conflicts with the goal of sharing pointers.)

2 Likes

Now I can distinguish between them.

Regarding the rules, is bottom line is that in both safe and unsafe Rust, whether using UnsafeCell or *mut T, I can not :

  • Mutate T if there is still an active &T.
  • Have two &mut to T, and then mutate through the ome &mut while the other &mut is still active.

In other words, *mut T does not suddenly allow two &mut to be active simultaneously, it only disables the compile time error. It is still a violation, and if violated, it results in Undefined Behavior.

So, I can just apply the safe Rust coding model while writing unsafe Rust to make unsafe Rust become safe, with the difference being that the compiler won't give me a compile time error if I make a mistake?

Are there any other safety rules for UnsafeCell and *mut T besides those?1

Yes, that’s right.

No, that’s not the difference. The difference is that unsafe Rust has some additional features whose correct use isn’t statically checked, so you have to follow the rules of Rust yourself instead of letting the compiler check.

So, Unsafe Rust lets you make use of an *mut T, but *mut T is a different type than &mut T and if you create an &mut T from *mut T, you still have all the &mut obligations to follow.

But Unsafe Rust does not turn off static checks — the only programs which start compiling when you add unsafe {} are programs that contain unsafe operations. For example, unsafe does not turn off borrow checking — it only allows you to use raw pointers which are never subject to borrow checking.

1 Like

Does that mean I can do anything with *const T and *mut T as long as I don't introduce &T or &mut T, which would bring in Rust’s reference/borrowing rules? Or the rules apply for raw pointer without promoting to reference too?

Are there any downsides to this? Since the compiler doesn't have the guarantee that raw pointers never alias (unlike references), it wouldn't be able to perform optimizations as aggressively that make references have higher performance than raw pointers?

Or does the compiler assume I am following the same rules and perform the same optimizations anyway, with the consequence being that if I don't actually follow them, it results in Undefined Behavior?

Suct as this code case :

fn main() {
    let mut a = 10;
    
    let b = &raw mut a;
    let c = &raw mut a;
    let d = &raw const a;
    
    unsafe {
        *b = 20;
        println!("{}", a);
        println!("{}", *d);
        *c = 30;
        println!("{}", a);
    }
    
}

And

fn main() {
    let mut a = String::from("a");
    
    let b = a.as_mut_ptr();
    let c = a.as_mut_ptr();
    let d = a.as_ptr();
    
    unsafe {
        *b = b'b';
        println!("{}", a);
        println!("{}", *d);
        *c = b'c';
        println!("{}", a);
    }
    
}

Miri is not happy with that:

20
error: Undefined Behavior: trying to retag from <222> for SharedReadOnly permission at alloc101[0x0], but that tag does not exist in the borrow stack for this location
--> src/main.rs:11:24
|
11 |         println!("{}", *d);
|                        ^^ this error occurs as part of retag at alloc101[0x0..0x4]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see  for further information
help: <222> was created by a SharedReadOnly retag at offsets [0x0..0x4]
--> src/main.rs:6:13
|
6 |     let d = &raw const a;
|             ^^^^^^^^^^^^
help: <222> was later invalidated at offsets [0x0..0x4] by a write access
--> src/main.rs:9:9
|
9 |         *b = 20;
|         ^^^^^^^

So the rules also apply to raw pointers, even without converting them into references :

  • I must not dereference a mutable pointer if there is an active immutable pointer to the same variable.
  • I must not dereference a mutable pointer if there is another active mutable pointer to the same variable.

What about the optimization part?

You can organize, copy, send, and read or write through those raw pointers, with respect to each other, freely. For example, you can have "overlapping" *mut pointers, unlike &mut references. You cannot use those raw pointers past, broadly speaking, the time it would have been valid to use a single reference you created instead.

Yes, exactly; such optimizations are an advantage of using references in cases where it is correct to use references. In particular, you may use raw pointers to implement some sort of sharing or subdivision that the borrow checker does not understand, but then convert to a reference in order to perform a single operation on part of that shared or subdivided memory. This also allows you to make more of your code safe code that accepts references, which reduces the opportunity for mistakes.

This code is incorrect not because of rules that apply to raw pointers, but because of rules that apply to borrowing a. When you write to *b, you invalidate the &raw const immutable borrow that created the pointer d, so you can no longer use d. This is why your program doesn’t pass Miri. But this is specifically because it is a &raw const, which makes the claim that the value won't be mutated; if you change the &raw const a to &raw mut a, then Miri approves — even if you then convert the resulting *mut i32 pointer to *const i32. When working with raw pointers, what matters is when and where you obtained them, not how you pass or copy them.

1 Like