Transmute (lifetime extension) review request

Hello fellows!

Could you review this simple usage of std::mem::transmute (lifetime extension): playground (commented)?

From the docs:

transmute is incredibly unsafe. There are a vast number of ways to cause undefined behavior with this function. transmute should be the absolute last resort.

:scream:

There are a few things that transmute is really useful for. [...] Extending a lifetime, or shortening an invariant lifetime. This is advanced, very unsafe Rust!

^ Yep this is what we are doing!

Thank you all!

This is unsound because even though the iterator doesn't use its mutable reference to the already iterated-over data, because it still holds a mutable reference to the region containing the data it counts. You will need to modify your iterator implementation to split up the mutable reference somehow.

1 Like

Usually the suggestion is to test various usages with Miri. In this case if you collect a Col<&mut RowMajor3x3> in a Vec and then loop through that Vec you'll trigger miri (playground, run miri from the "Tools" tab). I think the collection is needed to trigger Miri, but the UB is always present.

The problem here is that mutable references come with additional guarantees compared to pointers, in particular you can't alias two mutable references, but you code does exactly this, even though it never lets outside code to get those references.

I think that for things like these you're better off with raw pointers since they don't need to upheld those additional guarantees.

Thanks guys!

@Kestrer You will need to modify your iterator implementation to split up the mutable reference somehow.

This is OK for Col, but won't do it in Cols!

@Kestrer [Unsound] because it still holds a mutable reference to the region containing the data it counts.

@SkiFire13 You can't alias two mutable references [...], even though it never lets outside code to get those references.

Why is this unsound? What kind of issues is this implying?

@SkiFire13 I think the collection is needed to trigger Miri, but the UB is always present.

Why both code (with and without collect) give correct results?

@SkiFire13 I think that for things like these you're better off with raw pointers since they don't need to upheld those additional guarantees.

How? From Nomicon:

Also of course you can get all of the functionality of these functions [(transmute, transmute_copy)] using raw pointer casts or union s, but without any of the lints or other basic sanity checks. Raw pointer casts and union s do not magically avoid the above rules.

Because the compiler assumes that that can never happen. If it assumes things that aren't actually true then it could miscompile your program and produce nasty and hard to find bugs.

Because in this case it gives the expected result. UB can be anything, including what you expect it to be.

I'm not suggesting to change the transmute with raw pointers, but to store raw pointers inside Col and Cols and dereference them only when needed. Also, transmute and transmute_copy give useful lints when you're transmuting between different types, they're pretty much useless when transmuting lifetimes, so you aren't losing anything by not using them.

5 Likes

If you'd like a more concrete (albeit contrived) example of what could possibly go wrong:

fn tempt_the_compiler(one: &mut usize, two: &mut usize) {
    *one = 0;
    if *two != 0 {
        brick_this_computer();
    }
}

Because the compiler assumes there can't be two mutable references to the same memory, it's free to reorder the assignment and the if block.

@truchi I just wanted to say, I find your use of emoji in your post and the playground very entertaining!

:+1:t3::+1:t3: :grin:

And also thanks for taking soundness and the dangers of unsafe code serious by asking others for help with understanding the details.


By the way, you can (in the future) replace your hand-crafted quotes using mentions, e.g.

@truchi Thanks guys!

with proper quotes, e.g.

Those look fancier and link back to the original post they’ve quoted. They can be created by just highlighting the text you want to quote and clicking the “Quote” button that appears then. If you do this while you’re already editing a response, it’ll get inserted where your cursor last was in the editor. You can even quote cross-topic, e.g.

by simply navigating the website while keeping the editor open. (Yes it doesn’t close if you leave the topic.)

:blush:

playground? (macros sorry)

Miri is OK with that. I don't know what I am doing though. Features usize as isize.

(Btw type is not u8 IRL)

I'm pretty sure this solved the original problem of overlapping references.

However the interface of new methods is unsound, they shouldn't accept a raw pointer like that. Since they're safe to call I could just pass a null pointer or an invalid column and easily get UB from safe code.

Consider for example this code:

fn main() {
    let i = 0u8;
    let col = Col::new(&i, 0).collect::<Vec<_>>();
    let col = Col::new(&i, 1_000_000).collect::<Vec<_>>();
}

It's pure safe rust, however the interface of Col::new allows it to read arbitrary data!

Usually the right solution is separate the safe interface which takes references and checkes other parameters from the unsafe interface, which requires the caller to do the proper checks. Obviously the unsafe interface should be properly marked unsafe.

So... you +1 that last link?

I tried with *[const|mut] RowMajor3x3 instead of *[const|mut] u8 but Miri went back at it...

Right. Was not bothering for the example.

Do you have input for usize as isize? Docs says it's usually fine. Should I even care? I cannot find where Iter/IterMut do this. Though they do check ZST for a reason I don't understand.

Hum... why is this triggering Miri: playground ?

(*[const|mut] RowMajor3x3)

Would make my APIs a little nicer. I'm getting picky.

EDIT: OK I think I get it. Raw pointers not transitive right?

Let's say +1/2. It's still very unsound.

That still involves passing raw pointers from safe code.

It should be fine. If you want you can use add instead of offset since it accepts an usize, but there's the same requirement that it shouldn't overflow an isize. If you really care about that you could just assert that the length is less than isize::MAX or use wrapping_add. ZST are a special case because they don't occupy memory, so you could create any number of them.

I think that by dereferencing self.ptr you're creating a temporary reference that invalidates the other pointers.

For the nicer API, just take a reference in the safe interface and do the casting to *[const|mut] u8 internally before calling the unsafe interface.

What could we do to make it > +1/2 (besides unsafe fn)?

Aside from properly marking unsafe functions and validating input in the safe ones I don't think there's anything important to do.

Thanks, that was really helpful.

I will post the real life example if I survive it!