Is there any way to use raw pointers safely?

Hello,

I'm relatively new to Rust and I have a question about raw pointers and their "safe" use.

From what I understand pointers can be a great tool to simplify (and speed up) the code in certain circumstances, but I fail to understand how can they fit "safely" (or unsafely) in the language.

One of the use cases I saw is stuff like Trees and parent pointers, where you have something like this:

struct Node {
  left: Box<Node>,
  right: Box<Node>,
  parent: *mut Node
}

Now, reading the Rustonomicon I see that pointers can essentially be assumed not aliasing with anything that is not a pointer itself (variables, references), so my question is, when you have something like the above, where you have 2 things pointing to the same memory (nominally a pointer and Box), how is it possible to use it at all?

I read around (blogs/articles) that if you are using something like the above you need to be sure you never hold two references (one at least mutable) at the same time and you are fine, but this seems impossible to prove in many cases.

Like, if I write a function that takes a Node and accesses the pointer (and only the pointer) , in the context of that function yeah, there are no other references accessing parent, but in the context that calls the function there could be references alive across the function calls. Vetting your entire call graph in order to make sure that across a function call you never hold a pointer to an object seems very very difficult and not maintainable

So it seems to me that the pattern above is not usable in practice and the only safe way (if you want to go pointers) is to actually go full blown pointers like

struct Node {
  left: *mut Node,
  right: *mut Node,
  parent: *mut Node
}

And allocate the memory either manually or using box and consuming it into a pointer.

Does what I just said make sense or am I missing something? Is all the code I mentioned above using both Box and pointers most of the time invalid?

Is there a resource on how to use pointers in a way that doesn't potentially cause UB in your code?

Thanks

The Rust Nomicon is a must read if you are even thinking of reading unsafe code.

https://doc.rust-lang.org/nomicon/README.html

Hi Krishna,

as I mentioned above I did read it, it actually created more questions in my head that I had before :smiley:

Ah, sorry I seemed to have missed that when reading your question

You can use Rc/Arc for child ptr and Weak for parent ptr for up-reference pattern. Note that in your code the Boxes are not the unique entry point for the node. Box pointers are aliased, so you cam't mutate the contents of the node behind the ptr in general.

2 Likes

So its correct to say that if you use pointers then everything need to be pointer based essentially or you could go in "bad" land? (Like I mentioned in the example above) Can we also say that the type of code that I was talking about before and that I actually see around a lot (which is scary) is also broken and that if you use pointers then you need to use pointers all the way (and manage the memory yourself I guess and such).

Thanks!

1 Like

It's not that bad, there are actually ways to go from references (and other pointer-like entities from Safe Rust like Box) to raw pointers and back. You just need to be careful when using them...

For example, when implementing objects with shared mutability (from RefCell to concurrent data structures), UnsafeCell allows you to leave your memory management work to safe code, and only use raw pointers in the part of your code that actually requires shared mutability. On the user's side, the trick will be mostly invisible, aside of course from the fact that data can now be mutated via a shared reference. The objects will look normal, users will be able to allocate them on the heap via Box and Vec, etc.

One the other hand, you will need to be careful when implementing one of those shared mutability primitives, because if you ever provide an API that allows the user to generate two &mut to the same data from it (as UnsafeCell trivially allows you), it's instant UB kaboom!

1 Like

You are completely right, and this is one of the things that make non-aliasing pointers such as Box or &mut very tricky to use correctly when there are raw pointers around.

Thus, I would advise to follow @Hyeonu's advice and use Rc/Arc with Weak, for a safe construction.

But if you wanted to go the unsafe path anyways, then you'd indeed need to avoid keeping the pointer as a Box, rawifying them at construction and upgrading them back to Box on Drop.

1 Like

Thanks @Yandros, that's what I thought. So you need to be very careful with all the pointer stuff :slight_smile:

This is important information to know and should probably be more emphasized in the documentation, because its not clear how dangerous it could be. A lot of people looking at Rust I spoke with seem to be pretty confident that you can use all this stuff much more "liberally" that it actually can.

(By the way, and joining @HadrienG here, the &'_ mut T case can be solved by using &'_ UnsafeCell<T> instead)

Thanks @HadrienG for your reply.

What exactly is the method though? What do you need to be careful about?

From what I understand from the documentation I found about UnsafeCell its not special in anyway compared with other objects, am I correct? But if its the only thing pointing to an object and the only interface to get that pointed object out is through RefCell (and RefCell counts how many of each there are around) then that's how you make sure there's only one thing that you can use to access the object around at any one time. Am I correct there?

UnsafeCell<T> is special, as it is the only way to mutate behind a shared reference (&T). Every other way is UB. Everything in std that mutates behind a &_ uses UnsafeCell internally. UnsafeCell<T> turns off optimizations that assume that a &_ means immutable.

1 Like

UnsafeCell<T> is special, as it is the only way to mutate behind a shared reference ( &T ). Every other way is UB .

I frequently see things written to this effect, but it is a bit incorrect. There is no way at all to mutate a T through a &T. Not even with UnsafeCell<T>; because in that case, you're mutating T through an &UnsafeCell<T>.

Now, UnsafeCell is a lang item, so it must have some compiler magic associated with it. However, I did at some point attempt to locate where exactly in the rustc code that the compiler actually uses this lang_item during lowering to LLVM IR, and was not able to locate anything. (all I could find was places where UnsafeCell must be treated special during const evaluation)

4 Likes

I would expect the magic to somehow reside in the LLVM annotations that are placed on the pointers that are created from the UnsafeCell. In LLVM-speak, most Rust references should receive a noalias annotation indicating absence of mutable aliasing, but references to an UnsafeCell should not.

However, LLVM's noalias has historically been pretty broken, so it may well be that the annotation has currently been disabled for Rust references as well... especially as that latter issue that I just found again (and actually am no stranger to :wink: ) is still open.

That would explain why you saw no difference in the LLVM IR.

There's an internal auto trait Freeze which is un-implemented for UnsafeCell:

https://github.com/rust-lang/rust/blob/4b65a86ebace8600c8e269e8bfe3365cdc460e68/src/libcore/marker.rs#L584-L591

Then immutable references can be either PointerKind::Frozen or Shared:
https://github.com/rust-lang/rust/blob/4b65a86ebace8600c8e269e8bfe3365cdc460e68/src/librustc/ty/layout.rs#L2115-L2123

And this affects the argument attributes that should make it to LLVM IR:

https://github.com/rust-lang/rust/blob/4b65a86ebace8600c8e269e8bfe3365cdc460e68/src/librustc/ty/layout.rs#L2713-L2728

For example:

pub fn foo(_: &Cell<i32>) {}
pub fn bar(_: &i32) {}
; playground::foo
; Function Attrs: nonlazybind uwtable
define void @_ZN10playground3foo17hc101073ff22051efE(i32* align 4 dereferenceable(4)) unnamed_addr #0 !dbg !5 {
start:
  %arg0 = alloca i32*, align 8
  store i32* %0, i32** %arg0, align 8
  call void @llvm.dbg.declare(metadata i32** %arg0, metadata !22, metadata !DIExpression()), !dbg !23
  ret void, !dbg !24
}

; playground::bar
; Function Attrs: nonlazybind uwtable
define void @_ZN10playground3bar17h85d9491b10a44bb7E(i32* noalias readonly align 4 dereferenceable(4)) unnamed_addr #0 !dbg !25 {
start:
  %arg0 = alloca i32*, align 8
  store i32* %0, i32** %arg0, align 8
  call void @llvm.dbg.declare(metadata i32** %arg0, metadata !29, metadata !DIExpression()), !dbg !30
  ret void, !dbg !31
}

(only the latter has noalias readonly)

6 Likes

Thanks to all for the very detail answers. This is all great information. Also the information on UnsafeCell is great. I didn't get that before. Thanks to all, now I have a better idea on what to do with unsafe :slight_smile:

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.