How does Rust define null pointers?

Behavior considered undefined - The Rust Reference says

Rust code is incorrect if it exhibits any of the behaviors in the following list.

Evaluating a dereference expression (*expr) on a raw pointer that is dangling or unaligned,

The dangling raw pointer is defined as:

A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same live allocation (so in particular they all have to be part of some allocation).

In C, null pointer is defined as (void*)0, and in C++, it is defined as 0 or (T*)0. I didn't find the relevant rule about how Rust defines it.

Consider this scenario, in a certain embedded platform, all functions are by writing function codes to the address 0x0. Then, implementing this should have the following code:

fn main(){
  let codes = [0x1 /* open the door*/, 0x2 /* close the door*/];
  unsafe{
     let ptr = 0x0 as * const u8;
     *ptr = 0x1;
  };
}

According to the definition of Rust Reference, if Rust defined 0x0 as null, the code has UB and is incorrect. However, the platform just needs us to do something like this. So, what the behavior is in this context? And how to guarantee the code would do what is expected in this platform?

7 Likes

It's documented to be 0 as of 1.75

It's just as UB in C and C++ as it is in Rust. You could use inline assembly to do it.

12 Likes

Does that mean we are forbidden to do this by using Rust code even if in unsafe code? The only valid way to do this is by encapsulating the code with assemble.

Yes. Someday I hope SOC designers will stop mapping HW at address 0.

3 Likes

Wait. How about this way?

fn main(){
  let addr = 0;
  unsafe{
     let ptr = addr as * mut u8;
     *ptr = 0x1;
  };
}

Now, I think the behavior is implementation-defined?

2 Likes

No difference. It’s still UB. Also there literally isn’t any way around this in pure Rust. You can think of Rust’s dereferencing operator itself as if it was implemented with a if pointer == NULL { invoke_undefined_behavior() } step in place, if that helps. The memory access itself must happen outside of Rust’s memory model in order to have any chance of not triggering UB.

10 Likes

Note that it's not just embedded. WASM also have this issue. But there are no hardware at that address and it's enough to just stop putting real data there.

The way rustc invokes the linker on webassembly, the stack will be placed first, so address 0 is effectively part of the stack, but if you allocate even a single byte more stack you get a stack overflow that results in a wasm trap due to accessing address 2^32-1 which is unmapped unless you actually allocate the full 4GB of memory.

You can't do it in the rust abstract machine, period. That's why people are suggesting asm!. It doesn't matter what your hardware does. More about how you have to follow the rust rules no matter what chip you're using:

Also, *ptr = 1; *ptr = 1; with normal pointer writes might only call the thing once even if ptr was a valid address (thanks to desirable optimizations) so it's not what you wanted to use anyway.

4 Likes

Fun thing is that the C spec allows platforms with nullptr which is not bit pattern all 0, yet the (void*)0 expression is guaranteed to produce nullptr even on such platforms(to support NULL macro). How? The C compiler(not preprocessor) have responsibility to produce nullptr when encountered expression that casts integer literal 0 into any ptr type. Yes, integer literal 0, not an integer expression with value 0. This code below is not guaranteed to produce nullptr cross platform.

int zero = 0;
void *ptr = (void*)zero;
9 Likes

It's amazing how far both ISO C and C++ bent over backwards to support unusual architectures. Until recently C++ allowed non-twos-complement signed integer representations.

1 Like

IMO, Rust does not have a spec as far as now. Where can I find the document that tells the story of the object model of Rust that would be adopted by the spec? Or, the object model of Rust is just the same/similar as C++?

1 Like

You could try seeing what Ferrocene has documented: https://spec.ferrocene.dev/expressions.html#syntax_dereferenceexpression

1 Like