Why does the transition from `REGISTERING` to `WAITING` in the successful CAS operation need `Acquire` when implementing the AtomicWaker?

   pub fn register(&self, waker: &Waker) {
        match self
            .state
            .compare_exchange(WAITING, REGISTERING, Acquire, Acquire)
            .unwrap_or_else(|x| x)
        {
            WAITING => {
                unsafe {
                    // Locked acquired, update the waker cell

                    // Avoid cloning the waker if the old waker will awaken the same task.
                    match &*self.waker.get() {
                        Some(old_waker) if old_waker.will_wake(waker) => (),
                        _ => *self.waker.get() = Some(waker.clone()),
                    }

                    // Release the lock. If the state transitioned to include
                    // the `WAKING` bit, this means that at least one wake has
                    // been called concurrently.
                    //
                    // Start by assuming that the state is `REGISTERING` as this
                    // is what we just set it to. If this holds, we know that no
                    // other writes were performed in the meantime, so there is
                    // nothing to acquire, only release. In case of concurrent
                    // wakers, we need to acquire their releases, so success needs
                    // to do both.
                    let res = self.state.compare_exchange(REGISTERING, WAITING, AcqRel, Acquire);

                    match res {
                        Ok(_) => {
                            // memory ordering: acquired self.state during CAS
                            // - if previous wakes went through it syncs with
                            //   their final release (`fetch_and`)
                            // - if there was no previous wake the next wake
                            //   will wake us, no sync needed.
                        }
                        Err(actual) => {
                            // This branch can only be reached if at least one
                            // concurrent thread called `wake`. In this
                            // case, `actual` **must** be `REGISTERING |
                            // `WAKING`.
                            debug_assert_eq!(actual, REGISTERING | WAKING);

                            // Take the waker to wake once the atomic operation has
                            // completed.
                            let waker = (*self.waker.get()).take().unwrap();

                            // We need to return to WAITING state (clear our lock and
                            // concurrent WAKING flag). This needs to acquire all
                            // WAKING fetch_or releases and it needs to release our
                            // update to self.waker, so we need a `swap` operation.
                            self.state.swap(WAITING, AcqRel);

                            // memory ordering: we acquired the state for all
                            // concurrent wakes, but future wakes might still
                            // need to wake us in case we can't make progress
                            // from the pending wakes.
                            //
                            // So we simply schedule to come back later (we could
                            // also simply leave the registration in place above).
                            waker.wake();
                        }
                    }
                }
            }
            WAKING => {
                // Currently in the process of waking the task, i.e.,
                // `wake` is currently being called on the old task handle.
                //
                // memory ordering: we acquired the state for all
                // concurrent wakes, but future wakes might still
                // need to wake us in case we can't make progress
                // from the pending wakes.
                //
                // So we simply schedule to come back later (we
                // could also spin here trying to acquire the lock
                // to register).
                waker.wake_by_ref();
            }
            state => {
                // In this case, a concurrent thread is holding the
                // "registering" lock. This probably indicates a bug in the
                // caller's code as racing to call `register` doesn't make much
                // sense.
                //
                // memory ordering: don't care. a concurrent register() is going
                // to succeed and provide proper memory ordering.
                //
                // We just want to maintain memory safety. It is ok to drop the
                // call to `register`.
                debug_assert!(state == REGISTERING || state == REGISTERING | WAKING);
            }
        }
    }

As shown in the comment, when the CAS operation succeeds, there is no concurrent waker because that would change the current state to REGISTERING | WAKING, and in that state, the CAS cannot succeed.

So, why does the load part of the CAS need Acquire? The REGISTERING state can only be set by the above CAS, which changes the state from WAITING to REGISTERING, and the above CAS is sequenced before the below one.

Moreover, the memory order of the failed CAS is specified by the last argument, which is Acquire. So, IIUC, the successful CAS memory order only needs Release.

So, I don't know what the comment said

In case of concurrent wakers, we need to acquire their releases, so success needs to do both.

If there exists any concurrent waker, its calling would change the state to REGISTERING | WAKING, so the CAS cannot succeed. So, what is the effect of using acquire for the load part of the RMW operation here?