Is this undefined behaviour?

Hi!

I'm trying to understand the semantics of Rust a bit better. And I was wondering: does this particular program exhibit undefined behaviour?

struct A { v: i32 }

fn test(u: A, v: A) -> i32 {
  let mut x : (A, A) = (u, v);
  let sec : *const A = &x.1;
  x.0 = x.1;
  unsafe {
    (*sec).v
  }
}

fn main() {
  let u : A = A { v: 1 };
  let v : A = A { v: 2 };
  let _ret = test(u, v);
}

The raw pointer sec is refering to the second part of the tuple. That second cell has been moved with x.0 = x.1, and pointer are supposedly only valid when pointing to "live" values.
However, it's still on the stack so, there's it can't really be dropped before the function is finished.
Executing this code with the normal compiler on my machine or in the playground executes well and shows that _ret = 2, and it doesn't fail with Miri either.

Thank you for your answers!

1 Like

But struct A is not, it's not an integer here

There isn't any UB here because you're copying the field, which is an integer and is therefore trivially copyable.

Edit: as mentioned by @H2CO3, the tuples make it odd to read; but this is only UB if A is not Copy.

Oh right, the code is somewhat confusing due to the superfluous layer of tuples. In that case, this is indeed UB.

I agree it is a cumbersome example, but it's useful to exhibit several interesting aspects of rust (including the fact that you are allowed to move only some of the fields of a tuple), and the notion of raw pointers etc..

Is there any reason Miri doesn't fail there?

I have a feeling that there might in-fact be no UB here, but really I don’t know. The example you post would generalize to the question “Is it possible that some Rust code runs into UB, whereas the same code with an additional Copy implementation wouldn’t?” Or in other words, “does implementing Copy make a semantic difference (for code that compiles with or without the implementation), or is its only effect that certain new code compiles while it wouldn’t have compiled before?”

I think we all agree that the code would not be UB if A were to implement Copy.

Edit: Code that specializes on a Copy implementation can make Copy implementations entail semantic difference, but this doesn’t count for my question above.

Well, probably with some value in the heap, a Copy implementation could make an actual semantic difference, because of potential optimisations and values being dropped? However, on the stack, the value cannot be dropped before the function terminates, so I'm not sure.

Note that types that can implement Copy cannot have a Drop implementation or drop glue; so “value being dropped” equates to “no-op”.

Ah interesting

It's a thought experiment while writing an interpreter for MIR. Currently, my interpreter forbids such a behaviour, because when a value is moved, I can't use it anymore. But I'm wondering if that behaviour is the correct one.

Dropping is not the same as physical deallocation. Moving a non-Copy value invalidates the previous place of that value, so you are not allowed to access it anymore, regardless of whether or not it was already physically deallocated.

UB is not a synonym for "crash". Your code can have UB (voilate the rules of the language definition) and still happen not to crash.

I vote for UB. You are accessing a value that has been moved. I am far from being an expert on UB though, so it's 50:50 that I am wrong!

I understand for the compiled code. But I would expect Miri to catch such a simple example of UB if it is actually UB

For the record, note that there definitely is a non-UB-invoking way to duplicate a non-Copy value like the struct A above, i.e. using the function ptr::read.

2 Likes

According to this page in the reference, a place becomes uninitialized after being moved from, so then it seems the usual rules for accessing uninitialized memory would apply. I believe reading uninitialized memory as per this example would be UB.

1 Like

I feel like this page mostly just describes the behavior of variable initialization checks. It uses terms “deinitialized” and “reinitialized” which may have nothing to do with “uninitialized memory”.

Whoa, Nelly! That's really strange. The documentation for ptr::read() says that it reads from the source without invalidating it.

How is that even possible in the general case? If a non-Copy type is ptr::read() from, then neither the original nor the duplicate place is invalidated? How does that not contradict single ownership? Which place is the real owner of the value, then?

1 Like

The way I would think about it is this. Imagine a Computer with an extra bit for each byte of memory which stores whether the memory is initialised and good, and instructions for initialising and de-initialising the extra bits when a value is moved. It also checks the extra bit whenever memory is read, and stops execution entirely, with a flashing red light illuminated when uninitialised memory is read. I think the red light is going to come on!

Right, that seems like a fair interpretation. In other words, I was interpreting that page as reducing the OP's example to:

let x: A;
let sec : *const A = &x;
unsafe {
    (*sec).v
}

But without interpreting that page as such, one could say that the memory in the OP's example is still initialized with a valid value, albeit a moved one. (I'm also guessing this is why miri does not flag this as UB.)

I'm not sure what distinction @steffahn makes between unqualified "uninitilaized" and "uninitialized memory". The very purpose of describing places as initialized or uninitialized is to determine whether/how it is valid to access them.

I believe that it is exactly the case that your "reduced" example is equivalent with OP's code: both read out of an uninitialized place, causing UB. The operational semantics of the language shouldn't (and likely doesn't) care why the place is uninitialized. It might have never been initialized, or it might have been valid once but moved from. This really shouldn't matter – that is the specific purpose of reasoning with an abstract execution environment rather than hardware-induced happenstance.

In any case, I think we can reduce OPs code to something like

struct A {
    v: i32,
}

fn test(v: A) -> i32 {
    let x: A = v;
    let sec: *const A = &x;
    drop(x);
    unsafe { (*sec).v }
}

fn main() {
    let v: A = A { v: 2 };
    let _ret = test(v);
}

(i.e. get rid of the tuple)

which is code that miri seems to accept as-well.

1 Like