I have an array of atomics, let's say [AtomicU64; 512]. During the first stage I fill this buffer using atomic operations and during the second stage the buffer stays read-only (there is a proper memory synchronization between the stages).
Now, at the start of the second stage I also want to write the data to disk. The obvious solution would be to copy data to a scratch buffer using atomic operations and write data from this buffer, but it's obviously inefficient. Would it be sound to cast the original buffer to [u8; 4096] and pass it to the write syscall despite the fact that other threads may read data from it using atomic operations?
In my opinion, it should be sound. But there is this unfortunate restriction from the Intel manual:
Software should access semaphores (shared memory used for signalling between
multiple processors) using identical addresses and operand lengths. For example,
if one processor accesses a semaphore using a word access, other processors
should not access the semaphore using a byte access
In my understanding, this restriction applies only to mutating atomic operations ("semaphores", "signalling between multiple processors"), so it's not applicable to my case.
You have an interesting definition of “inefficient”. For 4 KiB of data written once to disk I would just do the transformation and move on. The on disk format would be exactly what I want and correct.
The main difference is that Rust permits concurrent atomic and non-atomic reads to the same memory as those cause no issue in the C++ memory model, they are just forbidden in C++ because memory is partitioned into “atomic objects” and “non-atomic objects” (with atomic_ref temporarily converting a non-atomic object into an atomic object).
in particular the next paragraph is also relevant
The most important aspect of this model is that data races are undefined behavior. A data race is defined as conflicting non-synchronized accesses where at least one of the accesses is non-atomic. Here, accesses are conflicting if they affect overlapping regions of memory and at least one of them is a write. They are non-synchronized if neither of them happens-before the other, according to the happens-before order of the memory model.
since you are going to guarantee that there are no more writes after the switch, it should be safe to cast &[AtomicU64; _] to &[u8; _] since there won't be any data races, the alignment of u8 is not greater than the alignment of AtomicU64, and the size of &[u8; _] is correct
Yeah I agree with @RustyYato -- this seems entirely harmless. From what you say it should be legal to do non-atomic reads from that memory at the time of the write-to-disk, and so if that is legal then surely the kernel doing reads is legal.
There'd be an interesting discussion if you needed the kernel reads to be atomic (which one might argue they could be), but that doesn't even seem to apply to your case.
If Intel CPUs don't support one thread doing a 4-byte read and another thread doing an 8-byte read from the same memory, then surely we'd have bigger problems.^^ Even entirely safe Rust code can trigger that condition easily.
I have a different interesting use case. Right now we have a similar atomic array which is used as a bitmap (every entry takes 4 bits). We want to dump this bitmap to disk from time to time. The bitmap is purely advisory and its 4-bit entries get renewed from time to time, so we do not care about potential tearing problems (i.e. we are fine if the first and last 4 bits in the bitmap get written from different points in time), but reconstructing the full array is a relatively costly operation, so we want to read a cached version on startup and during operation.
The problem is that the array gets constantly mutated and taking a lock on it would have a big performance impact. Ideally, we would just write from this buffer directly, but right now we are being conservative and first create a buffer copy using loop with relaxed atomic loads/stores and then write the copied data to disk. What do you think, would it be sound to write the bitmap to disk directly in this case?
I agree, just have a note: Doing the switch may require some sort of synchronization, to ensure that all writes are actually finished and visible to the reader.
Yeah that's exactly the question of whether the reads done by write are atomic or not. Practically speaking we can probably consider them to behave like a bytewise atomic relaxed memcpy to some private kernel memory -- except that there is no such operation in Rust right now.
So, once this RFC gets accepted+implemented, that should be sound. Until then it's in the gray zone.
Unfortunately that RFC has been stalled for a long time on one of the usual problems in rust: bikeshedding the perfect API. Which is really odd, since it would be better to get it unstable and experiment with what works. And then we can decide on an API before stabilisation.
It's not just about the API. As far as I can see, LLVM also just does not support those operations yet. And there are some open questions around their semantics (related to mixed-sized accesses).
The API thing is just what is most visible since everyone immediately joins that bikeshed. But if it was just that, someone would have pushed through a preliminary unstable API by now.
You are talking about “points in time”, but, of course, without atomics such thing doesn't exist, not even on x86: creation of a single notion of “time” is the whole point of them, after all.
Would it be Ok for you to get something pretty stale near something very new, in the same snapshot?
If all elements are absolutely independent then this should work, otherwise no.