It says that a read-modify-write operation takes place if the comparison with current succeeds and a load operation takes place when the comparison fails. But for the comparison in both cases, an initial load operation is needed, right? Which ordering will be used for this initial load operation?
There isn't an “initial” load — the entire RMW is done as an atomic unit. If it operation succeeds, the entire RMW is done with the success order. If it fails, the old value is read with the failure order.
Clarifying this behavior was a motivator for replacing compare_and_swap with compare_exchange. The main one being to allow weaker orderings, of course.
Perhaps you'd find fetch_update's ordering easier to grok? Since it does actually use a separate load and store. (But loses its forward progress guarantee (atomicity) in exchange.)
If you haven't seen it yet, Mara Bos's Rust Atomics and Locks is an excellent resource and explainer not only on Rust's atomics but on atomics in general, and the content is available for free in webpage form.
Mostly because portability of atomic semantics, partly because the "C++ atomic model" that everyone uses to prove properties about parallel code doesn't support LL/SC semantics, and partly because LL/SC only provide their strong guarantee for "small" critical sections when they're located in the same cache line.
Also since fetch_update runs an arbitrary closure in the update loop, I don't want to think about what's the result if you do another LL/SC update in the closure.