Scoped thread: what effect the MaybeDangling in thread::bulder::spawn_unchecked_ takes

The current code in std contains the fix for the issue , I can't understand why the unsound can be fixed by introducing MaybeDangling.

    unsafe fn spawn_unchecked_<'a, 'scope, F, T>(
        self,
        f: F,
        scope_data: Option<Arc<scoped::ScopeData>>,
    ) -> io::Result<JoinInner<'scope, T>>
    where
        F: FnOnce() -> T,
        F: Send + 'a,
        T: Send + 'a,
        'scope: 'a,
    {
        let Builder { name, stack_size } = self;

        let stack_size = stack_size.unwrap_or_else(|| {
            static MIN: AtomicUsize = AtomicUsize::new(0);

            match MIN.load(Ordering::Relaxed) {
                0 => {}
                n => return n - 1,
            }

            let amt = env::var_os("RUST_MIN_STACK")
                .and_then(|s| s.to_str().and_then(|s| s.parse().ok()))
                .unwrap_or(imp::DEFAULT_MIN_STACK_SIZE);

            // 0 is our sentinel value, so ensure that we'll never see 0 after
            // initialization has run
            MIN.store(amt + 1, Ordering::Relaxed);
            amt
        });

        let my_thread = name.map_or_else(Thread::new_unnamed, Thread::new);
        let their_thread = my_thread.clone();

        let my_packet: Arc<Packet<'scope, T>> = Arc::new(Packet {
            scope: scope_data,
            result: UnsafeCell::new(None),
            _marker: PhantomData,
        });
        let their_packet = my_packet.clone();

        let output_capture = crate::io::set_output_capture(None);
        crate::io::set_output_capture(output_capture.clone());

        // Pass `f` in `MaybeUninit` because actually that closure might *run longer than the lifetime of `F`*.
        // See <https://github.com/rust-lang/rust/issues/101983> for more details.
        // To prevent leaks we use a wrapper that drops its contents.
        #[repr(transparent)]
        struct MaybeDangling<T>(mem::MaybeUninit<T>);
        impl<T> MaybeDangling<T> {
            fn new(x: T) -> Self {
                MaybeDangling(mem::MaybeUninit::new(x))
            }
            fn into_inner(self) -> T {
                // Make sure we don't drop.
                let this = ManuallyDrop::new(self);
                // SAFETY: we are always initialized.
                unsafe { this.0.assume_init_read() }
            }
        }
        impl<T> Drop for MaybeDangling<T> {
            fn drop(&mut self) {
                // SAFETY: we are always initialized.
                unsafe { self.0.assume_init_drop() };
            }
        }

        let f = MaybeDangling::new(f);
        let main = move || {
            if let Some(name) = their_thread.cname() {
                imp::Thread::set_name(name);
            }

            crate::io::set_output_capture(output_capture);

            let f = f.into_inner();
            set_current(their_thread);
            let try_result = panic::catch_unwind(panic::AssertUnwindSafe(|| {
                crate::sys::backtrace::__rust_begin_short_backtrace(f)
            }));
            // SAFETY: `their_packet` as been built just above and moved by the
            // closure (it is an Arc<...>) and `my_packet` will be stored in the
            // same `JoinInner` as this closure meaning the mutation will be
            // safe (not modify it and affect a value far away).
            unsafe { *their_packet.result.get() = Some(try_result) };
            // Here `their_packet` gets dropped, and if this is the last `Arc` for that packet that
            // will call `decrement_num_running_threads` and therefore signal that this thread is
            // done.
            drop(their_packet);
            // Here, the lifetime `'a` and even `'scope` can end. `main` keeps running for a bit
            // after that before returning itself.
        };

        if let Some(scope_data) = &my_packet.scope {
            scope_data.increment_num_running_threads();
        }

        let main = Box::new(main);
        // SAFETY: dynamic size and alignment of the Box remain the same. See below for why the
        // lifetime change is justified.
        let main =
            unsafe { Box::from_raw(Box::into_raw(main) as *mut (dyn FnOnce() + Send + 'static)) };

        Ok(JoinInner {
            // SAFETY:
            //
            // `imp::Thread::new` takes a closure with a `'static` lifetime, since it's passed
            // through FFI or otherwise used with low-level threading primitives that have no
            // notion of or way to enforce lifetimes.
            //
            // As mentioned in the `Safety` section of this function's documentation, the caller of
            // this function needs to guarantee that the passed-in lifetime is sufficiently long
            // for the lifetime of the thread.
            //
            // Similarly, the `sys` implementation must guarantee that no references to the closure
            // exist after the thread has terminated, which is signaled by `Thread::join`
            // returning.
            native: unsafe { imp::Thread::new(stack_size, main)? },
            thread: my_thread,
            packet: my_packet,
        })
    }

I have read the cause of the unsound:

Specifically, this main function here takes as implicit argument its closure environment, which contains f . That environment might just be a reference to some memory that is allocated in the caller stack frame. However the main function can keep running after their_packet got dropped (implicitly at the end of main ). Then the scope might end and the memory might be deallocated all while main is still running. If the environment is just a reference, it ends up being a newtype and we will (AFAIK) add the dereferenceable attribute, meaning deallocation while the function runs is actually LLVM UB.

What is the environment in That environment might just be a reference to some memory?
The main closure move all variables in spawn_unchecked_, including some reference. Because any variable referenced by another variable in current stack frame can't be moved, e.g.

fn main () {
    let z = String::from("10");
    let zr = &z;
    let d = move|| {
        println!("{:p}, {:p}", zr, &z);
    };
}

can't be accepted, so I think the environment means some reference held by f, which may point to some variable in caller's stack of the spawn_unchecked_.

But why MaybeDangling can move the unsound away? After all, if some reference is held in the f, then after drop(their_packet), the stack may pop, the reference in f still will dangle in concept ( dereferenceable attribute in llvm mentioned in the post?), with or without MaybeDangling, there are no differences.

I guess the real question is why passing a reference in MaybeUninit can erase the UB in llvm.
I guess in the view of llvm, by declaring the dereferenceable it is allowed to deference at anytime, then it may freely produce code like that:

let main = move || {
    let f = MaybeUninit.assume_read_init();
    let reference = f.reference;

    f(...);
    drop(their_packet);
    
    // for some unknown reason:
    dereference(reference);
}

Well, UB still occurs even with MaybeUninit

There's details in the RFC, but basically,

In LLVM terms: we are annotating that reference with dereferenceable, which means “dereferenceable for the entire duration of this function call”

and

To handle situations like this, Rust has a special type called MaybeDangling<P>: references and boxes in P do not have to be dereferenceable or follow aliasing guarantees.

(I didn't look into the linked issue in depth since it didn't seem like you'd read the RFC. The issue is one of the RFC examples.)

Oh, yes, the issue is exactly why the RFC is proposed.
Then, since the RFC has not been implemented yet, so, the code in std still containing an UB?

And, if the RFC is implemented, should it just be stopping annotating the dereferenceable by rustc?

Looks like it's still a workaround using MaybeUninit. But the workaround is intended to remove the UB.

And noalias. If I said anymore I'd just be proxying the RFC.

Ok, I believe I figured out why MeybeUninit can be used here to remove the UB: because it is not assured to be initialized, then any reference in T can't not be deferenceable by definition.

So, references in MeybeUninit does not has this attribute.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.