What I don't get is how having a fence in the else arm gains anything since the null check would have already been false to get into this arm in the first place.
From my limited understanding, acquire/release is primarily about how the ordering of non-atomic memory operations relate to the atomic ones. So, in this case, the memory we care about is the pointer’s target rather than the pointer itself— If the pointer isn’t pointing to anything, we don’t need the synchronization overhead.
On the other hand, if the pointer is referencing some data, we want to wait at the fence until all of the writes to that target data on the other thread have completed. Otherwise, the read of *p could see some arbitrary partial state (thus being UB).