What are the exact guarantees for `ptr::read/write_volatile`?

So I know that *_volatile functions are not allowed to assume that a value in memory does not change between reads/writes. But sometimes there may be multiple memory-mapped registers that are connected, and the order that they are written to matters. Is the compiler allowed to re-order volatile reads/writes to different addresses? Can this be stopped with compiler_fence or fence in std::sync::atomic?

In the armcc code there is instruction synchronization barrier, data synchronization barrier, and data memory barrier. What are the differences between these, and how do they correspond with what is available in std?

Copied C source
/**
  \brief   Instruction Synchronization Barrier
  \details Instruction Synchronization Barrier flushes the pipeline in the processor,
           so that all instructions following the ISB are fetched from cache or memory,
           after the instruction has been completed.
 */
#define __ISB() do {\
                   __schedule_barrier();\
                   __isb(0xF);\
                   __schedule_barrier();\
                } while (0U)

/**
  \brief   Data Synchronization Barrier
  \details Acts as a special kind of Data Memory Barrier.
           It completes when all explicit memory accesses before this instruction complete.
 */
#define __DSB() do {\
                   __schedule_barrier();\
                   __dsb(0xF);\
                   __schedule_barrier();\
                } while (0U)

/**
  \brief   Data Memory Barrier
  \details Ensures the apparent order of the explicit memory operations before
           and after the instruction, without ensuring their completion.
 */
#define __DMB() do {\
                   __schedule_barrier();\
                   __dmb(0xF);\
                   __schedule_barrier();\
                } while (0U)

Going from the documentation, the compiler shouldn't reorder volatile reads and writes on the same thread, but it makes no guarantee about the atomicity of the operations in a multi-threaded situation and data races are undefined behaviour. I think something like std::sync::atomic::fence() should be used if you need multi-threaded synchronisation guarantees.