Review of "nested" async interrupt executor for UB

This is a follow-up question to "Would be lovely if s/o could check this simple async executor for UB".

While the linked topic implemented a block_on() method to drive a task to completion from a blocking context, this time I intend the interrupt executor to spawn() a task from an async context driven by a different "upper" executor:

use core::{
    cell::UnsafeCell,
    future::poll_fn,
    pin::{Pin, pin},
    ptr::{self, null_mut},
    sync::atomic::compiler_fence,
    task::{Context, Poll, RawWaker, RawWakerVTable, Waker},
};

use nrf52840_pac::{NVIC, SWI0, interrupt};
use portable_atomic::{AtomicPtr, Ordering};
use static_cell::StaticCell;

use crate::util::CancellationGuard;

// Safety: While the task pointer is `null` the scheduling context may mutate
//         the outer waker. If the task pointer is non-null then the outer_waker
//         may be read by the interrupt context. The getter and setter
//         implementations for the task pointer must ensure proper compiler
//         fencing.
struct State {
    // The task pointer points to the inner future to be executed. The task
    // pointer will be modified from both, scheduling and interrupt context.
    task_ptr: AtomicPtr<Pin<&'static mut dyn Future<Output = ()>>>,

    // The outer waker will only be present when spawning a task from within
    // another task (nested executors). The outer waker will only be modified
    // from scheduling context and then be released read-only to both,
    // scheduling and interrupt context.
    outer_waker: UnsafeCell<Option<Waker>>,

    // The inner waker is immutable. It may be accessed from interrupt context
    // only.
    inner_waker: Waker,
}

impl State {
    const fn new(raw_inner_waker: RawWaker) -> Self {
        Self {
            task_ptr: AtomicPtr::new(null_mut()),
            outer_waker: UnsafeCell::new(None),
            inner_waker: unsafe { Waker::from_raw(raw_inner_waker) },
        }
    }

    /// Initialize the interrupt executor.
    ///
    /// # Safety
    ///
    /// This method unmasks the interrupt and therefore may break concurrent
    /// critical sections. It must be called in early initialization code before
    /// concurrent critical sections might be active.
    ///
    /// Using the executor w/o calling `init()` will cause undefined behavior.
    unsafe fn init(&self) {
        NVIC::unpend(interrupt::SWI0_EGU0);
        // Safety: See method doc. There should be no concurrent critical sections.
        unsafe { NVIC::unmask(interrupt::SWI0_EGU0) };
    }

    fn set_task_ptr(&self, task_ptr: *mut Pin<&'static mut dyn Future<Output = ()>>) {
        compiler_fence(Ordering::Release);
        self.task_ptr.store(task_ptr, Ordering::Relaxed);
    }

    fn task_ptr(&self) -> *mut Pin<&'static mut dyn Future<Output = ()>> {
        let task_ptr = self.task_ptr.load(Ordering::Relaxed);
        compiler_fence(Ordering::Acquire);
        task_ptr
    }

    // Safety: Must only be called from scheduling context and requires the
    //         outer waker to be acquired to the scheduling context (i.e. the
    //         task ptr to be null).
    unsafe fn set_outer_waker(&self, waker: Waker) {
        let outer_waker = unsafe { self.outer_waker.get().as_mut() }.unwrap();
        *outer_waker = Some(waker);
    }

    // Safety: Must only be called from scheduling context and requires the
    //         outer waker to be acquired to the scheduling context (i.e. the
    //         task ptr to be null).
    unsafe fn clear_outer_waker(&self) {
        let outer_waker = unsafe { self.outer_waker.get().as_mut() }.unwrap();
        *outer_waker = None;
    }

    // Safety: The outer waker is immutably read-only while owned by the
    //         interrupt handler. May be accessed from both, interrupt and
    //         scheduling context.
    unsafe fn outer_waker(&self) -> &Option<Waker> {
        unsafe { self.outer_waker.get().as_ref() }.unwrap()
    }

    fn pend_interrupt() {
        // Safety: Triggering a task is atomic and idempotent.
        NVIC::pend(interrupt::SWI0_EGU0);
    }

    fn on_interrupt(&self) {
        let task = self.task_ptr();

        // Safety: We're converting from a pointer that has been generated verbatim
        //         from a valid &mut reference. Therefore the pointer will be
        //         properly aligned, dereferenceable and point to a valid pinned
        //         future. Pinning and synchronizing via the pointer ensures that
        //         the pointer cannot dangle. Checking for null pointers is
        //         required to protect against spurious wake-ups.
        if let Some(task) = unsafe { task.as_mut() } {
            // The inner waker is shared immutably.
            let mut cx = Context::from_waker(&self.inner_waker);

            // Safety: A non-null task pointer indicates that the interrupt
            //         temporarily owns the task and outer waker.
            match task.as_mut().poll(&mut cx) {
                core::task::Poll::Ready(_) => {
                    if let Some(outer_waker_ref) = unsafe { self.outer_waker() } {
                        outer_waker_ref.wake_by_ref();
                    }

                    // Safety: Setting the task pointer to null hands ownership over
                    //         task and outer waker back to the scheduling context.
                    self.set_task_ptr(null_mut());
                }
                core::task::Poll::Pending => {}
            }
        }
    }

    // QUESTION: Does this have to be
    // `sync fn spawn<Task: Future<Output = ()> + 'static>(&self, task: Task)` ?
    async fn spawn<Task: Future<Output = ()>>(&self, task: Task) {
        debug_assert!(NVIC::is_enabled(interrupt::SWI0_EGU0), "not initialized");

        // Safety: The pinned task must not move while the interrupt accesses
        //         it. Note that this is not automatically enforced by Pin (as
        //         the pointee may be Unpin).
        let mut pinned_task: Pin<&mut dyn Future<Output = ()>> = pin!(task);
        let pinned_task_ref = &mut pinned_task;

        let cancellation_guard = CancellationGuard::new(|| {
            // Safety: The interrupt checks the task pointer and then runs
            //         atomically from the perspective of the scheduling
            //         context. So it's ok to drop this task at any point from
            //         the scheduling context.
            self.set_task_ptr(null_mut());
            // Safety: Setting the task pointer to null acquired the outer
            //         waker.
            unsafe { self.clear_outer_waker() };
        });

        poll_fn(|cx| {
            // Safety: Loading the task pointer (re-)acquires the outer waker's
            //         memory.
            let task_ptr = self.task_ptr();
            let outer_waker = unsafe { self.outer_waker() };

            if let Some(outer_waker_ref) = outer_waker {
                if task_ptr.is_null() {
                    // A task pointer value of null while a waker is present
                    // signals that the task finished.

                    // Safety: Loading a null value for the task pointer
                    //         re-acquires ownership of both, task and outer
                    //         waker, to the scheduling context so they can be
                    //         dropped. Also note that we asserted that the
                    //         (immutable) waker is "some", so no need to
                    //         re-check here.
                    unsafe { self.clear_outer_waker() };

                    Poll::Ready(())
                } else {
                    // The interrupt never wakes an unfinished task, therefore a
                    // non-null task pointer value while a waker is present
                    // signals an unsolicited poll by the outer executor, e.g.
                    // when racing concurrent tasks in a select primitive.

                    // Safety: While the interrupt is active we only have shared
                    //         read-only access to the waker. We therefore do
                    //         not support migration of the task.
                    debug_assert!(outer_waker_ref.will_wake(cx.waker()));

                    Poll::Pending
                }
            } else {
                // We're polled for the first time...
                debug_assert!(task_ptr.is_null());

                // Safety: While the task pointer is null, the scheduling
                //         context exclusively owns the outer waker for
                //         modification.
                unsafe { self.set_outer_waker(cx.waker().clone()) };

                // Setting the task pointer releases both, task and waker,
                // to the interrupt context.
                self.set_task_ptr(ptr::from_mut(pinned_task_ref).cast());

                // Safety: Once the interrupt context owns task and waker, it is
                //         safe to trigger it.
                State::pend_interrupt();

                Poll::Pending
            }
        })
        .await;

        cancellation_guard.inactivate();

        // Safety: We need to extend lifetime until we're sure the task
        //         is no longer being accessed.
        #[allow(clippy::drop_non_drop)]
        drop(pinned_task);
    }
}

// Safety: We synchronize the contents of state via the task atomic.
unsafe impl Sync for State {}

const fn raw_waker() -> RawWaker {
    RawWaker::new(ptr::null(), &VTABLE)
}

unsafe fn clone_waker(data: *const ()) -> RawWaker {
    // Safety: We always return the same (static) vtable reference to ensure
    //         that `Waker::will_wake()` recognizes the clone.
    RawWaker::new(data, &VTABLE)
}

unsafe fn wake(_: *const ()) {
    // Safety: Triggering a task is atomic and idempotent.
    State::pend_interrupt();
}

unsafe fn wake_by_ref(_: *const ()) {
    // Safety: Triggering a task is atomic and idempotent.
    State::pend_interrupt();
}

unsafe fn drop_waker(_: *const ()) {
    // no-op
}

static VTABLE: RawWakerVTable = RawWakerVTable::new(clone_waker, wake, wake_by_ref, drop_waker);
static STATE: State = State::new(RawWaker::new(ptr::null(), &VTABLE));

#[interrupt]
fn SWI0_EGU0() {
    STATE.on_interrupt();
}

pub struct InterruptExecutor;

impl InterruptExecutor {
    /// Installs the given task as stateful interrupt handler.
    ///
    /// The interrupt will be pended and the state machine driven to completion
    /// as soon as the resulting future is being `await`ed.
    ///
    /// Mutability proves exclusive ownership of the executor, i.e. only a
    /// single task can be driven by this executor at any given time.
    async fn spawn<Task: Future<Output = ()>>(&mut self, task: Task) {
        STATE.spawn(task).await;
    }
}

static EXECUTOR: StaticCell<InterruptExecutor> = StaticCell::new();

/// Initialize the interrupt executor.
///
/// Takes exclusive ownership of the software interrupt backing the executor.
///
/// # Safety
///
/// This method unmasks the interrupt and therefore may break concurrent
/// critical sections. It must be called in early initialization code before
/// concurrent critical sections might be active.
///
/// # Panics
///
/// This method may only be called once. Will panic on subsequent calls.
pub unsafe fn executor(_: SWI0) -> &'static mut InterruptExecutor {
    unsafe { STATE.init() };
    EXECUTOR.init(InterruptExecutor)
}

The API is intended to be used like this:

    async fn run(&mut self) -> SomeResult {
        let mut result: Option<SomeResult> = None;

        // Note: We prefer passing the result through a captured
        //       variable as this is more efficient than moving it via
        //       the poll result, interrupt handler and interrupt executor
        //       back to outer context.
        self.executor
            .spawn(async {
                    let r = Self::peripheral();
                    poll_fn(|_| {
                        if r.events_dma_end.read().events_dma_end().bit_is_set() {
                            r.intenclr.write(|w| w.dma_end().set_bit());
                            r.events_dma_end.reset();
                            Poll::Ready(())
                        } else {
                            r.intenset.write(|w| w.dma_end().set_bit());
                            Poll::Pending
                        }
                    })
                    .await;

                dma_end_fence();
                let some_result: SomeResult = ...read from DMA buffer...;
                result = Some(some_result);
            })
            .await;

        result.unwrap()
    }

This seems to work in practice because the spawned task is immediately awaited and driven to completion before the outer task resumes.

The API, however, is highly reminiscent of "stacked futures" and Forgetting futures with borrowed data - language design - Rust Internals except that it runs exclusively in a no-alloc environment w/o Box::pin() and Rc::pin().

I'm not able to reproduce the unsoundness in a no-alloc environment. Trying to mem-forget with pin! doesn't compile.

Questions:

  • Does this mean that the pattern is safe if I ensure that heap allocation is unavailable? But could I ensure that at all at compile time?
  • If UB can be triggered even in no-alloc or if no-alloc cannot be proven: Would the spawn() function be sound if it required a 'static lifetime for the task?
  • Alternatively: Would it become safe, if it only accepted already pinned tasks?
  • If I made spawn() unsafe instead, would it be sufficient to document that the task passed to spawn() must not be leaked before driven to completion, similarly to what the async-scoped crate does ?

@alice You seem to be very deep into async and your feedback was very valuable already. Maybe you could have a look as you already know what brought me there? For context: This is part of a contribution to the (upcoming) dot15d4 crate: Introduction of interrupt executor and improved timer by fg-cfh · Pull Request #76 · dot15d4-rs/dot15d4 · GitHub.

I don't buy this API at all for the reasons you outlined. Even if allocation is not available, you can achieve similar things via a static variable.

Requiring a 'static lifetime for the task isn't sufficient here. Right now, task_ptr is not a pointer to the task. It's a pointer to a pointer to the task, and the intermediate pointer is on the stack of spawn, creating the same problems as a non-static future.

Also, when you set task_ptr to null in case of cancellation, you need to wait for the interrupt in case it is still using the pointer.

@alice Thank you for looking into this! Really appreciated. Let me see whether I understood your feedback correctly.

I think there are two parts to your review:

Access to a dangling intermediate task pointer

Good observation. I'll try to fix this in the next iteration, might be a fundamental show-stopper for a fully safe solution, though. Update: No - I actually prefer the unsafe "scoped task" approach after exploring the problem space further. Borrowing w/o synchronization is too attractive in our case to be given up in favor of safe 'static tasks.

Access to a dangling task pointer by the interrupt

Have you seen that task_ptr has static lifetime and atomically gates access to memory by the interrupt? If the interrupt fires after we set this pointer to null, then the interrupt handler will not access any other memory. IMHO this should be safe because the interrupt always runs at a higher priority than the scheduling context (which is a documented safety invariant), so there's no way to invalidate memory from scheduling context or set the task_ptr to null while the interrupt is running. In other words: There's no need do anything special to wait for the interrupt because the scheduling context will simply not be able to run while the interrupt is active.

What am I missing here?

Unsafe solution in the meantime?

So the 'static solution is out for the moment being, what about the unsafe solution from my original post?

Wouldn't it be valid if I made spawn() unsafe and document that the task passed to spawn() must not be leaked before driven to completion, similarly to what the async-scoped crate does. Then, both the task and the intermediate pointer to the pinned task would be safely dropped (or forgotten) after use, right?

Further references for whoever happens to find this thread in search and wants more context:

  • An excellent write-up by Tyler Mandry about scoped tasks: See "relaxing safety" in the "Subsets of Features" section, which is the choice I currently favor in our context. After all we're trying to solve a specific practical problem local to our crate. Our interrupt executor is not meant to be exposed outside of that crate.
  • Another great blog post by without.boats describing the "trilemma of scoped tasks": It's obvious that we want "Parallelizability + Borrowing", which cannot be done safely in Rust as has been pointed out in the references cited in my original post.
  • The completion crate that chooses another unsafe approach to achieve a similar goal: futures that must be driven to completion once they have been polled once.
  • The moro crate which has another detailed explanation why "parallel + scope" cannot currently been done safely in Rust.
  • The unleakable types section in the docs of the keyword generics initiative has lots of related links concerning linear types and effects.

Following references in these documents one can further explore the subject area. It seems that the design space I stumbled upon is actually quite well understood and there's lots of current and future options to choose from. Fascinating topic.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.