Safety of casting from `*mut T` to `*mut ()` to `*mut T` (+dynamic linking?)

Is it safe to cast a pointer in the way mentioned in the title, then access it? I've had some trouble finding detailed information on the safety of pointer casts.

Update+answer

Sounds like casting to () and back, then dereferencing, is safe. Even type punning is safe because C's "strict aliasing" rule is not in Rust. (But keep in mind that struct layout is largely undefined!) Source: smart guys in this thread, links from quinedot, also Aria's blog says so and she knows a lot about Rust

Things I've seen so far (warning: long)

  • Actually running my code that does this seems to work fine. Miri also thinks it's safe. Playground link
  • However it also thinks that, if I have struct Thing(u8), and cast from a *mut u32 to a *mut Thing, then dereference, it's safe. This seems like it's supposed to be undefined behavior, maybe it's something Miri can't detect. Playground link
  • If I have struct Thing(u64) and try to cast between a *mut u32 and a *mut Thing the compiler says "casting references to a bigger memory layout than the backing allocation is undefined behavior, even if the reference is unused". I don't even have to run Miri. Playground link
  • std::ptr is not very helpful for this.
  • Rust Reference: pointer-to-pointer cast says that when casting *mut T to *mut U that "If T and U are both sized, the pointer is returned unchanged."
  • Rustonomicon FFI gives an example that suggests casting between a C struct pointer and a Rust pointer to an unrelated struct is safe.

Root problem
If this sounds like an XY problem, maybe it is. My goal is to create an internet simulator of sorts, that can dynamically load users' code, from a lib, to add Machines to this internet. Similarly to how some video games can load mods at runtime. Basically, a plugin system. The need for *mut () comes from allowing users to create "methods" for their machines:

/// Machine interface
#[no_mangle]
pub unsafe extern "C" fn allocate_machine(buf: *mut ()) {}

#[no_mangle]
pub unsafe extern "C" fn send(machine: *mut (), msg: u8) {}

the main code then calls these functions.

1 Like

The only UB part of this is casting an invalid *mut Thing into &mut Thing. You can do almost anything to raw pointers except overflowing math, and maybe casting to and from usize/isize. Your round-trip cast should be fine.

7 Likes

No, why? If the size of the target's referent is smaller than that of the source, and so is the alignment, then what could go wrong?

The only potential problem with this exact declaration is that the layout of structs is not guaranteed by default, so you must add #[repr(C)] or #[repr(transparent)] to be safe.

2 Likes

Rust doesn't use Type-Based Alias Analysis like C, so changing the type of an object via pointer casts is not forbidden in itself.

On the C side it's safest to use char * for pointing to arbitrary other types' data, since char (any signedness) is exempt from the TBAA restrictions.

7 Likes

I was worried because Rust's rules follow from C somewhat, and it seems like the C standard says casting a pointer to one of another type, then accessing the value is UB because of "TBAA" or "strict aliasing."

An object shall have its stored value accessed only by an lvalue expression that has one of the following types 73) or 88): a type compatible with the effective type of the object, ...

As far as I can tell, C's 64-bit ints and 8-bit ints aren't compatible. (Of course, even if this were C, we could still cast thing* -> void* -> thing* and be safe.)

Though @kornel says that Rust doesn't use TBAA. I can see why Rust would be compiled this way, but if it's not too much trouble, is it possible you could provide a source? Thanks for the help guys.

In TBAA, notice the "alias analysis" part of that.

It's an important part of C++ compilers knowing that a write through an int* and a write through a float* don't alias.

Rust doesn't need that because it gets that alias analysis help from all &muts instead.

(And people constantly write code that gets it wrong in C++ too, so getting rid of it is also helpful on that front.)

6 Likes

I don't know of an official page that calls out the lack of such UB explicitly, but you can infer it from various other documentation, or just search GitHub for many comments pointing this out to others or taking it for granted.[1]


  1. Including by people who would know, not that you should have to know who they are to get this information. ↩︎

2 Likes

Oh, yes, I already forgot that in C, you have pain and suffering derived from types, even though the language is very weakly typed.

No, that would also violate strict aliasing in C big time. Yes, but in fact you can go through any other pointer, and still be safe, if the final access/dereference occurs through the correct type.

The point of TBAA/strict aliasing is that you are not allowed to access (read/write) an object of type T through an lvalue of type U unless non-cv(T) == non-cv(U) (or U == signed or unsigned char).

Casting between arbitrary pointer types does not in itself violate, fix, or affect TBAA in any way. It's the access (ie. after dereferencing) that matters. You can never possibly fix this using intermediate pointer types in C. You have to use a union. T* -> void * -> U* isn't any better or worse than directly going T* -> U*. If T != U (modulo CV-qualification, as appropriate), these are equally invalid, and if T == U, they are equally valid.

2 Likes

Seriously? How could qsort be useful, then?

Passing pointer to thing* as pointer to void* and then casting back to thing* is standard practice in C. And that's exactly what we discussing there, it's even in text you quoted and rejected.

And you couldn't access anything directly via void* (except of you use GCC extension) so how would that be not safe?

Seriosuly? Can you show me relevant part if the standard? Type punning via union is another GCC extension and very much not guaranteed in clang (I know, I asked, the answer was: we try to recognize and support it where it's obvious, but there are no guarantees because that's subtle and non-standard GCC extension… just look on the page where it's documented for chrissake).

I would say that it's much simpler. All these work related to tree borrows would be useless, if TBAA was part of Rust. Rust does have an array type (unlike C) and various tricks that work in C because of lack of proper array types would stop working with TBAA.

Aria's blog post outlines attempts to formalize Rust memory model and what's done to make sure it can still be used even before it's fiinished.

1 Like

Sorry, I thought OP was referring to casting to a different type, because the previous post was about accessing objects via dereferencing a pointer to a different type. My claim is that dereferencing T* -> void * -> U* isn't any better than doing T* -> U* directly. Of course, T* -> void * -> T* is always fine (for object, i.e., non-function, pointers). But the latter doesn't contain any type punning, so it's completely orthogonal to the point being made.

The point is that type punning based on unions is at least not UB (in C). It's implementation-defined.

So while you can't possibly hope to achieve valid code when type punning is performed via pointers, it's at least potentially possible via unions if your implementation supports it. Therefore, my claim that you have to use unions if you even hope to avoid UB still stands.

Someone else already enumerated the exact sections in the standard for you.

1 Like
4 Likes

To be completely clear, I was only bringing up the *mut u32 to *mut Thing cast because I thought it was undefined behavior, and was showing that Miri couldn't detect undefined behavior involving casts. (I see now that type punning with pointers is allowed in Rust.) Indeed, I'm not doing any type punning myself - just casting a pointer to () and back.

Also, thanks Kornel for the resource, nice to see a lot of related information in one place.

1 Like

Fun fact: this isn't guaranteed by the language standard! According to the C language standard, it is UB to cast to a pointer type if the pointer is not sufficiently aligned for the cast-to target type. For the same case, C++ says that the resulting pointer value is unspecified.

Disclaimer: my source is cppreference, not the actual standard(s).

2 Likes

Better to use https://eel.is/c++draft/, this would be close to what compiler developers are using.

Fun fact: no one in C/C++ community cares about actual standards. ISO sells standards for living which makes them hard to get, and that, in turn, means that not just C/C++ users, but also compiler developers, too, often don't have access to them and thus are forced to use drafts.

I have no idea if there are actual significant differences between standard and these drafts, but wouldn't be surprised to find out that in case of such differences most compilers would implemt what draft says and not what actual standard says.

1 Like

Note that simply casting a pointer in Rust is safe basically by definition: you can do it without using unsafe {}, and everything you can do in safe Rust is unconditionally safe and cannot cause UB by itself. Any violation of that property is called "unsoundness", and is considered a critical bug.

The real question is whether dereferencing the resulting pointer is safe (and note that creating a safe reference requires dereferencing). This may seem like a pedantic distinction, and, well, it is, but it's also an important one. In C/C++, even seemingly benign operations, such as casting a pointer without using it, can cause UB. For example, it is UB to simply read the value of a pointer which was freed, without any dereferences. A specific manifestation of that is that pointers passed to realloc are considered freed, and cannot be accessed. You must access the allocation only through the returned pointer, and doing something as simple as comparing the old and new pointer is considered UB.

void *old_ptr = malloc(1);
void *new_ptr = realloc(old_ptr, 2);
// The following check is UB!
if (old_ptr == new_ptr) {
  // do stuff if allocation wasn't moved
}

That's not because of a pointer cast, but because you create a reference from the new pointer. References in Rust are (generally) considered unconditionally dereferenceable by the compiler. Creating a reference which violates this property is instant UB, even if the reference isn't accessed in any way.

1 Like

Reality is much more complex and sad, unfortunately. There are certain constructs, formally defined as UB, that actual compilers support and, crucially, there are also other constructs are that not supported even if standard says they should be supported!

And yet… The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object has not been allocated.

What does that phrase, which may have the same value as a pointer to the old object even mean if you are not supposed to even read old pointer? I guess copying one pointer to another, then passing that one into realloc
and then first original pointer is Ok? Let's test that idea… nope, doesn't work.

Nobody knows for sure what's one can and can not do with pointers in C/C++.

That's the biggest issue of C/C++ world: compiler writers have treated standard as holy gospel used to bash C/C++ users over the head with strange and often very unnatural rules for decades and thus now, when they face that situation where they need to change them… they couldn't.

Because this would immediately raise the question: so have you punished us, all C/C++ developers who want to play certain bit-manipulation games, and castigated and lorded over with something that's not holy and sacred, but is merely a piece of paper? Why is the “holy standard” only “holy and sacred” when it's time to explain why our programs, which worked for decades, are “actually invalid”, but it's no longer “holy and sacred” when it's time to change rules for the compiler developers sake?

It would be interesting to know how they would resolve the issue but I hope Rust would avoid that fate because it doesn't try to pretend it's own language spec is “holy and sacred” and most developers very explicitly accept that “something is works like this because this is written in the reference” never is a sufficient answer by itself: reference is merely the best current compromise between needs of various parties and every part of it needs some external justification to exist.

That's perfect example. Justification here is that sometimes it's a good idea to pull some code that reads reference out of tight loop even if said tight loop is never executed even once in some cases.

That's valuable optimization but there are ongoing duscussion about whether that's the property worth keeping or not. There are pro and contra, but current consensus then it's better to keep that rule for now because relaxing it later is easier then making it more strict.

It's not case of “holy and sacred” standard, that's simply a compromise that rust compiler developers and rust users agreed upon. For now.

It may be changed later… and that ability to change it is very much an integral prt of the game, but it's based on reciprocality: since everyone agrees that reference is merely the best known description of current compromise everyone is free to offer changes… which would be rejected, most likely, but with some sane justification and not just… because you are performing a sacrilege by mere act of asking question why rules are like what they are you have to be expunged… begone!

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.