Thanks for linking the C++ proposal. That's an interesting read that explains it quite well. As I interpret it, the current API is mainly due to specifics for certain platforms that provide a CPU instruction for compare-and-exchange and not because of platforms where it's constructed with loads and stores.
Is there even a point in restricting
success/failureorderings? On architectures with load-linked/store-conditional instructions the load and store are distinct instructions which can each have their own memory ordering (with appropriate leading/trailing fences if required), whereas architectures with compare-and-exchange already have a limited set of instructions to choose from. The current limitation (assuming [LWG2445] is resolved) only seems to restrict compilers on load-linked/store-conditional architectures.
I imagine the API was chosen to create a consistent story between both types of architectures, limiting the more flexible to the more restrictive.
However, this doesn't make sense. On failure, there is no store! There's only a load, so you can only compare it to other loads. So when you specify success: Release, failure: Aquire, the load for success is Relaxed, which is weaker than the load for failure, thus rejected by the current API.