A way to remove duplicated code for & and &mut?

Liii · April 19, 2020, 7:09pm

I am trying to understand the problem with duplicated code for & and &mut in accessor-type functions. I'm trying to understand whether a particular solution to this problem is sound[1].

The following is an example of the problem. It is taken from the really nice tutorial Learning Rust With Entirely Too Many Linked Lists.

type Link<T> = Option<Box<Node<T>>>;

pub struct List<T> {
    head: Link<T>,
}

struct Node<T> {
    elem: T,
    next: Link<T>,
}

impl<T> List<T> {
    // Other methods left out...
    
    // The implementation of peek is simple, but still long enough
    // that you'd like to avoid duplicating it if that is possible.
    // Some other accessor-type functions could be much more complex
    // so that you'd want to avoid duplication even more.
    pub fn peek(&self) -> Option<&T> {
        self.head.as_ref().map(|node| {
            &node.elem
        })
    }

    // Exact duplicate of `peek`, except for the types
    pub fn peek_mut(&mut self) -> Option<&mut T> {
        self.head.as_mut().map(|node| {
            &mut node.elem
        })
    }
}

The solution

It seems to me like it is possible to implement peek_mut using the return value peek, casting it to *mut and then to a Option<&mut T>. It seems like the implementation is sound and provides a safe interface for an unsafe implementation.

The following is the implementation:

// Implemention of peek_mut by casting return value of `peek()`
pub fn peek_mut(&mut self) -> Option<&mut T> {
    let head_raw_mut : *mut T = self.peek().map_or_else(std::ptr::null, |r| r) as *mut T; // (1)
    unsafe { head_raw_mut.as_mut() } // (2)
}

This is my reasoning for why I think it is sound:

No unsafe code is used in (1), since it is safe to create raw pointers.
At (2), head_raw_mut is the single pointer to the head element.
The lifetime of the head element is same as self, which is same the as the return value.
head_raw_mut goes out of scope before the new mut ref can be used, so there are never two mutable refs simultanously in use.

Hence the use of the unsafe function pointer::as_mut is sound.

Misc notes

The Nomicon strongly asserts that transmuting & to &mut always have undefined behaviour, which might by applicable also to this code. But I think Nomicon is talking about code which simultaneously keeps and uses shared and multable references. This code doesn't do that.
The Rust Reference states that "Breaking the pointer aliasing rules" is UB. To me it seems like this code doesn't do that, for reasons given above.
I have posted a similar question about another (but worse) solution attempt on Stack Overflow.
There are other kinds of solutions to the problem, such as Abstracting over mutability in Rust. But all other solutions seem to introduce quite a bit of extra complexity to the code.

Super detailed version of the code

The following is the code of the peek_mut, but expanded with separate local variables for the different sub-expressions. In this way it seems even more clear to me that the code doesn't use any unsound combinations of pointers or references.

pub fn peek_mut_exp(&mut self) -> Option<&mut  T> {
    let head_opt_ref_mut : Option<&mut T>;
    {
        let head_raw_mut : *mut T;
        {
            let head_raw_const : *const T;
            {
                let head_opt_ref : Option<&T> = self.peek();
                // Step 1, scope: 1 ref
                head_raw_const = head_opt_ref.map_or_else(std::ptr::null, |r| r);
                // Step 2, scope: 1 ref, 1 raw const ptr
            }
            // Step 3, scope: 1 raw const ptr
            head_raw_mut = head_raw_const as *mut T;
            // Step 4, scope: 1 raw const ptr, 1 raw mut ptr
        }
        // Step 5, scope: 1 raw mut ptr
        head_opt_ref_mut = unsafe { head_raw_mut.as_mut() };
        // Step 6, scope: 1 raw mut ptr, 1 mut ref
    }
    // Step 7, scope: 1 mut ref
    head_opt_ref_mut
}

Questions

I have the following questions for people with deeper Rust knowledge than myself:

Is the provided implementation of peek_mut sound?
If it is indeed not sound, why is that? Can you give a detailed explanation?
Is there in that case a variation of the solution that is sound?

[1]: That is, that the code works as intended and doesn't have undefined behaviour, the definition given by Unsafe Code Guidelines.

Ixrec · April 19, 2020, 7:40pm

I'm no expert on this stuff, but I don't believe any qualification like that was intended. I see similarly strong language without asterisks in other places:

The UnsafeCell documentation says it is undefined behavior, so people shouldn't do it.

In general, transmuting an &T type into an &mut T is considered undefined behavior.

And the Stacked Borrows aliasing model relies on the presence of UnsafeCell to define when shared data may be mutated, so I would guess that the only caveat intended in the above statements is "if it's behind an UnsafeCell" which is not the case here. Specifically, this description of Stacked Borrows says:

For Shared , the permission is different for locations inside of and outside of UnsafeCell . Inside UnsafeCell , it is SharedReadWrite ; outside it is SharedReadOnly .

Finally, playground MIRI does see UB in this code:

error: Undefined Behavior: trying to reborrow for SharedReadWrite, but parent tag <1660> does not have an appropriate item in the borrow stack
   --> /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177:1
    |
177 | / pub unsafe fn drop_in_place<T: ?Sized>(to_drop: *mut T) {
178 | |     // Code here does not matter - this is replaced by the
179 | |     // real drop glue by the compiler.
180 | |     drop_in_place(to_drop)
181 | | }
    | |_^ trying to reborrow for SharedReadWrite, but parent tag <1660> does not have an appropriate item in the borrow stack
    |
    = help: this indicates a potential bug in the program: it performed an invalid operation, but the rules it violated are still experimental
    = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
    = note: inside `std::intrinsics::drop_in_place::<Node<i32>> - shim(Some(Node<i32>))` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177:1
    = note: inside `std::intrinsics::drop_in_place::<std::boxed::Box<Node<i32>>> - shim(Some(std::boxed::Box<Node<i32>>))` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177:1
    = note: inside `std::intrinsics::drop_in_place::<std::option::Option<std::boxed::Box<Node<i32>>>> - shim(Some(std::option::Option<std::boxed::Box<Node<i32>>>))` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177:1
    = note: inside `std::intrinsics::drop_in_place::<List<i32>> - shim(Some(List<i32>))` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177:1
note: inside `main` at src/main.rs:31:1
   --> src/main.rs:31:1
    |
31  | }
    | ^
    = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:34
    = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:73
    = note: inside `std::sys_common::backtrace::__rust_begin_short_backtrace::<[closure@DefId(1:6032 ~ std[4cdb]::rt[0]::lang_start_internal[0]::{{closure}}[0]::{{closure}}[0]) 0:&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/sys_common/backtrace.rs:130:5
    = note: inside closure at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:52:13
    = note: inside `std::panicking::r#try::do_call::<[closure@DefId(1:6031 ~ std[4cdb]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:331:40
    = note: inside `std::panicking::r#try::<i32, [closure@DefId(1:6031 ~ std[4cdb]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe]>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:274:15
    = note: inside `std::panic::catch_unwind::<[closure@DefId(1:6031 ~ std[4cdb]::rt[0]::lang_start_internal[0]::{{closure}}[0]) 0:&&dyn std::ops::Fn() -> i32 + std::marker::Sync + std::panic::RefUnwindSafe], i32>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394:14
    = note: inside `std::rt::lang_start_internal` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:51:25
    = note: inside `std::rt::lang_start::<()>` at /playground/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/rt.rs:67:5

error: aborting due to previous error; 1 warning emitted

error: could not compile `playground`.

To learn more, run the command again with --verbose.

This is not exactly a great error message, and unfortunately it doesn't point to any part of peek_mut, but I believe this is the important part:

trying to reborrow for SharedReadWrite, but parent tag <1660> does not have an appropriate item in the borrow stack

And that does line up with the theory that you need an UnsafeCell to make this legal.

Hopefully someone who really knows what they're talking about can confirm/deny this.

alice · April 19, 2020, 7:47pm

See also this similar thread.

It is always UB to cast &T to &mut T under all circumstances. Even if the & originated from a &mut, the compiler added extra assertions on the pointer after it was made immutable that you can't just ignore.

RustyYato · April 19, 2020, 10:07pm

The only case where it is not immediately UB to derive a &mut T from a &U is if you go through UnsafeCell in that derivation. No other exceptions. So since simply transmuting between &T to &mut T doesn't go through an UnsafeCell, it is immediate UB. Note: even when using UnsafeCell it is rather easy to get UB via aliasing &mut _ if you aren't careful.

TomP · April 20, 2020, 4:25am

Just to clarify: Don't think "UB is something I'm willing to live with." Think "UB releases the compiler to screw up my program in its quest for hyper-optimization." That discrepancy from what you thought you coded proivdes the opportunity for malware explotation. UB is never safe, ever.

Liii · April 20, 2020, 3:55pm

That is certainly a strong indication that something in my code is not allowed...

Thanks for the link! I will study that article. Maybe it will help me understand something about the underlying problem: How to reason about borrows in unsafe code?

But here there is an asterisk! The little phrase "in general"...

Liii · April 20, 2020, 3:56pm

Detailed Argument

But how about the detailed argument:

Liii:

// Implemention of peek_mut by casting return value of `peek()`
pub fn peek_mut(&mut self) -> Option<&mut T> {
    let head_raw_mut : *mut T = self.peek().map_or_else(std::ptr::null, |r| r) as *mut T; // (1)
    unsafe { head_raw_mut.as_mut() } // (2)
}

After the execution of the line marked with (1) no shared ref to the head element exist anymore, it is out of scope and gone. We're left with the value in memory and a single unique raw pointer to it.

And only safe code has been used to produce this pointer.

Does it matter how we got this pointer? It it raw and unique and has a certain pointer value. Isn't that all that affects the situation?

Because of this, when the unsafe function as_mut is called the program is in a valid state. The specification of as_mut is followed, most importantly this:

"the memory this pointer points to must not get accessed (read or written) through any other pointer."

Surely it must be sound to construct a mutable reference from head_raw_mut under those conditions?

The version of the code with expanded sub-expressions show even more clearly that the original shared ref is gone, since the block it is defined in has been left.

Wrapper for C components

This is how many wrappers for C API:s work. You pass a shared ref to some C function. Then the shared ref goes out of scope. The program continues and does other stuff. At some later point the same pointer value is returned from the C API as a raw pointer. Since you know that it is unique you can convert it to a unique pointer and mutate its target.

It seems impossible to write a wrapper for a C component if you cannot do this kind of things...

Liii · April 20, 2020, 3:57pm

But my point is that the original reference is out of scope and gone. Any assertions that the compiler had about this variable are also gone and not relevant anymore. All that is left is a raw pointer with a certain pointer value.

TomP · April 20, 2020, 4:06pm

Ask yourself "Is there any region in the program where two &mut references exist for the same variable?" If the answer is "Yes" then the compiler backend (e.g. LLVM) is permitted to mis-compile your program as a byproduct of its quest for hyper-optimization.

Liii · April 20, 2020, 4:18pm

To me it seems like this is not the case. In peek_mut_exp we can see this:

In the beginning of the function & co-exists with *const but is then dropped.
In the end of the function &mut co-exists with *mut, but the former is then dropped.

Apart from that only raw pointers co-exist with each other.

But it might be that Rust's rules for soundness is more strict that the ones that you give. The reason might be for example to allow for reordering of assignments and inlining of constants.

skysch · April 20, 2020, 4:18pm

The phrase "in general" means that you cannot transmute an &T into a &mut T, unless you have something that specifically allows it, which in this case is UnsafeCell. It is not telling you that you can concoct some other way around it, it is saying that you cannot.

Now, maybe that is wrong. Perhaps there is some way to make this not-UB without using UnsafeCell. But the phrase "in general" here is not in any way allowing for that, and you'd have to know exactly what optimizations Rust will do and know that no new optimization will be created that will invalidate your assumptions. That means this is UB, because UnsafeCell is the only optimization-aware, forward-compatable escape hatch that exists (AFAIK) for doing this sort of thing (if indeed this sort of thing is actually that UnsafeCell is meant to aid in.)

mbrubeck · April 20, 2020, 4:30pm

Sometimes it's possible to factor out duplicate code like this by writing a peek_raw function that only uses raw pointers, and returns a raw pointer. Then peek and peek_mut would call peek_raw.

I can't see a way to do it here, though, because I don't think it's possible to get a *const T from an Option<T> without going through &Option<T> or &mut Option<T>. (Or, more generally, because you can't match directly on a raw pointer to an enum.)

Even when it works, one downside to this approach is that it introduces much more unsafe code than necessary.

Ixrec · April 20, 2020, 4:55pm

Well, I thought it was obvious that the "detailed argument" wasn't valid once we established that the & -> &mut conversion is UB after all, but okay we can go through that too.

Technically correct, but that would never be relevant in any soundness argument. When a module contains a single unsafe anywhere, all of the other safe code in that same module is potentially relevant for soundness. The classic example here is Vec::set_len()'s implementation being a trivial one-liner of safe code, but loads of unsafe code in Vec relies on the len field being not just a valid integer but the right integer, so if set_len wasn't an unsafe function that would be a giant soundness hole.

Correct.

This is probably a fundamental misunderstanding of what lifetimes are. For now, let's just point out that Stacked Borrows (and AFAIK any other plausible model of unsafe Rust semantics) doesn't have lifetimes at all. The lifetime analysis that Rust does is entirely separate from Stacked Borrows and permits only a subset of the programs Stacked Borrows does (after all, that's why you need unsafe: to write code that Rust can't prove is correct, but you can).

What Stacked Borrows cares about here is the permission tags on each pointer, and as we saw above this pointer has only a SharedReadOnly permission, not the SharedReadWrite permission you would need to make it legal to turn the pointer into a &mut.

Correct.

In general, your last few posts seem to be arguing not on the basis of how Rust defines UB (or, well, Stacked Borrows, since Rust has no official precise definition yet), but on what you would've expected to be UB or what seems implementable to you. "I never have two &muts aliasing at the same time, so this must be okay." The premise is entirely correct, and nobody's disputing that; it's just not enough to prove soundness.

This sort of miscommunication is an extremely common problem in the C and C++ worlds, largely because of a persistent multi-decade failure to communicate between tool vendors/standards committees and the language users. But it does remain a misconception: the definition of Rust UB is whatever Rust says it is. Rust does impose more requirements than just "no aliasing &muts", and unsafe code has to ensure all of them or it's still UB.

Once we get past that mis-framing of the problem, then we can have more interesting discussions about what Rust should define as UB to allow some code and optimizations at the expense of forbidding other code. Although in this case there's not much to say: we clearly need Stacked Borrows to consider all safe code UB-free, and this UnsafeCell requirement has been a key part of interior mutability in safe code since at least Rust 1.0, so there's not much wiggle room here.

trentj · April 20, 2020, 6:13pm

If you are familiar with C, it may help to point out that &mut references are not just non-const pointers "plus lifetimes" -- they are more or less the equivalent of restrict-qualified pointers, except that &mut references are allowed in more places, easier to use correctly and consequently much more common in Rust than restrict pointers are in C.

TomP · April 20, 2020, 7:18pm

The condition relative to &mut in my post upthread was written hurriedly before an appointment, and was incorrect. The wording should have been:

Stacked Borrows attempts to mimic the way that LLVM analyses its input. If Stacked Borrows finds a problem, then it is almost certain that LLVM is permitted to miscompile your program in its quest for optimization.

cuviper · April 20, 2020, 7:41pm

Note that shared & references are also like restrict pointers when they don't contain UnsafeCell, through the internal marker trait Freeze. In fact, rustc only emits LLVM noalias for those references right now, and #54878 tracks the status of LLVM so we can enable noalias for &mut in the future.

bogmihdav · April 21, 2020, 8:52am

Given how Fn is already using Higher-rank trait bounds to deal with lifetimes, couldn't they be generalized to cover mut-ness too?

I'm imagining something like where for<mut? 'a> F: Fn(&mut? 'a T1) -> &mut? 'a T2, and have the lambda expression return a &mut? as well. I'm not very familiar with the subject matter, so I'm not sure how feasible this would be, and if not, why not.

Hyeonu · April 21, 2020, 9:01am

This is not true, Rust's rule is even stricter. You cannot transmute &T into a &mut T. No you can't, no exception.

What UnsafeCell<T> allows is to get *mut T from &UnsafeCell<T> and make &mut T from it with some runtime check. You can't transmute &UnsafeCell<T> into &mut UnsafeCell<T>.

glandium · April 21, 2020, 9:04am

A macro that rewrites the body of the function to create another function for the other case, replacing the relevant tokens, would work too. The main problem would be indicating to the macro what the relevant tokens are and what they should be replaced with. For a simple case like OP, replacing & with &mut and as_ref with as_mut is enough, but that's a simple case. There are cases where you'd need to only change a couple &s and not others. And cases where as_ref doesn't apply to the thing that needs to be mutable.

system · July 20, 2020, 9:04am

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it fine to have multiple mutable pointers to something if I don't dereference more than one?	31	4271	January 18, 2021
Re-interpreting a Repr-C'd struct's bytes as a certain type	41	1639	November 8, 2019
How to avoid write almost the same code due to borrow checking help	11	686	May 21, 2022
Why can't we have multiple &mut help	43	3915	January 12, 2023
Questions about `&mut T` and move semantics, `&mut T` is "move-only"? help	51	6031	April 28, 2020

A way to remove duplicated code for & and &mut?

Related topics