Shared memory, fences, and IO

I have code which uses UNIX sockets and shared memory for inter-process communications.

Let's assume that we have two processes which have a shared memory region. In one process:

// write to shared memory

fence(Release);
sendmsg(...)?;
// do some other stuff
let msg = recvmsg(...)?;
fence(Acquire);

// read from shared memory

And in another process:

let msg = recvmsg(...)?;
fence(Acquire);
// read data from shared memory, process it, and write result back
fence(Release);
sendmsg(...)?;

AFAIK this is a pretty standard way of doing this kind of IPC, but by reading docs I couldn't construct a proper argument for why this code is correct.

In other words, can we consider IO operations as "atomic" operations described in the fence docs, i.e. can sendmsg and rescvmsg play role of x.store(3, Relaxed) and x.load(Relaxed) respectively? In particular, what prevents compiler from theoretically reordering sendmsg right before fence(Release) in the first process?

cc @RalfJung

3 Likes

Disclaimer: I'm not an expert at all; essentially just answering in the hope that someone will correct / expand on this.

It seems the main issue is that this concerns the entire combination of machine semantics, OS semantics, and language semantics, and I would actually be surprised if that combination is sufficiently well-specified.

Focusing on the Rust side at first, I think everything would have well-defined semantics if fence followed by sendmsg was a single "external" operation, and so was recvmsg followed by fence. E.g. if both were C functions, and the shared memory is accessed via a raw pointer that may be aliased, things basically come down to how the C functions access the shared memory (or cause accesses later).

In particular, the compiler would definitely not be allowed to reorder these function calls with accesses to the shared memory, unless it could prove that the functions cannot possibly access the same memory without causing undefined behavior. This is an important caveat, e.g. if you hold a reference across such a call, that's essentially a promise that the corresponding location within the shared memory will not be written to or not even be read, depending on the reference.

Moreover, the two processes are required to cooperate in order to avoid data races (unless atomic accesses are used everywhere). Not sure if it's a good idea to have undefined behavior in case another process misbehaves.

I think these conditions can be formulated as safety requirements on the external functions that combine the memory fence and system call, and then on the Rust side it's just a matter of upholding these requirements.

The next questions would then be: Is it OK to implement those two functions in (unsafe) Rust instead, and if yes, is fence the correct mechanism for their implementation?

I believe the answer to the first question is yes, simply because naively translating a C function to unsafe Rust should yield a Rust function with the same semantics. But then I'd say that fence is actually at the wrong abstraction level. As you observed, the docs don't really specify its semantics at the machine/OS level (unless you read it into the word "CPU" there). IMHO, what's really needed is something lower level that is guaranteed to interact with sendmsg and recvmsg in the correct way.

I guess it would have to be "whatever you would use in C there". Unfortunately it raises the question whether that is sufficiently well-defined.

Yes, holding in the first process an exclusive reference to shared memory mutated in the second process across the sendmsg/recvmsg pair would be an obvious UB. I don't think requirements here are in any way different from memory shared across threads.

I think it's not different from trusting system APIs or other shared libraries. We just postulate that program correctness depends on adherence to the specified communication protocol by both programs.

This notion of "external" operations is not well defined. In the extreme sendmsg can be an asm! block which does not accept any references to the shared memory, i.e. IIUC it means that the compiler may assume that it does not read/write it (unless we store pointer to it in a global variable).

We probably can split the problem into two parts:

  1. Interaction of fence with non-pure code. It's probably should be sufficient to specify that fence does not allow reordering of other reads/writes across it (which would automatically include calls to non-pure functions), i.e. it always acts like compiler_fence(SeqCst) regardless of used ordering.
  2. Can we model memory shared across processes in the same way as memory shared across threads? In theory, the compiler may perform "smart" program analysis and decide that since the memory is written only in one place in the program, it can "optimize" the subsequent read. In the worst case scenario, the answer may be "no, you have to use volatile operations for memory shared across processes".

The "whatever you would use in C there" approach is probably the only practical option right now, but it would be nice to have more clarity where exactly the gray (or maybe even black) zone starts.

It wouldn't be marked as options(pure) and thus the compiler has to assume that there may be a SeqCst atomic operation in the body of the asm block which will synchronize with the fence and as a result the asm block can't be reordered across the fence.

The catch is that this postulate is a safety requirement, so all code that can't enforce this property on its own is unsafe. It's probably not such a big deal, e.g. you could wrap the file descriptor in a struct with an unsafe fn new that says "safety: the caller must ensure that the process at the other end correctly implements the protocol", and then all code that receives an instance of the struct can rely on it for safety. But technically, if you let the user specify an arbitrary socket/process, you main function would have to become unsafe. (No idea if that is allowed.)
But yeah, it's a bit of tangent.

That can't be the full story because then no FFI would be well-defined.

I was specifically referring to the case where fence followed by sendmsg is implemented as a single C function, because I wanted to focus on the Rust side first. It's a bit subtle: sendmsg on its own may not have sufficiently well-defined semantics, but the combination could be described as an unsafe function with certain safety requirements regarding the shared memory. My point is: It would be just like every other C function called from Rust then.

That's one way to split the problem, but I was trying to split it into two different parts: The "pure Rust" side, where everything has well-defined semantics in terms of (unsafe) Rust, and the side I called "external" which is basically just "code behind a FFI". The advantage of this way of splitting the problem is that it's just like any other FFI. In particular, the notion of processes isn't really part of the interface, it's just "what safety requirements do the callers need to fulfill, in terms of accessing the shared memory?"

In particular, the answer to your second question is clearly "yes" then, because whether the code behind the FFI communicates with a separate process becomes an implementation detail of that code. It could just as well spawn a thread and access the memory from there.

Ah, true. I guess the same logic applies to calling an arbitrary FFI function.

I don't think that "externality" is a useful property. Depending on build options the same FFI function may come from a shared library, or linked statically. In the latter case LTO may be applied which will defeat any reasoning based on "externality". Or I could use rustix which depending on enabled crate features can link to the libc's sendmsg, or use raw syscalls implemented with asm!. In the former case we also could build the program for a MUSL target.

So I think it's better to talk about properties which apply equally to both FFI calls and asm! blocks (without pure and nomem options). For example, in the following code:

let (mut a, mut b) = (0, 0);
ffi_fn_or_asm_block(&mut a);
use_b(&b)

The compiler has right to assume that value of b does not change across the ffi_fn_or_asm_block call, despite the theoretical possibility of this function writing into stack memory allocated for b by calculating offset relative to a or to frame pointer. We simply declare that such FFI functions are not correct and can not be used safely from Rust.

Yeah, it could spawn a thread, but similarly to the example above, if we did not pass a reference to the shared memory (explicitly or using a global), then the compiler theoretically may assume that this memory can not be accessed in this potentially spawned thread.

In other words, the issue is that memory sharing is done not through pointers, but using file descriptors, which are not part of the memory model.

So I think it's better to talk about properties which apply equally to both FFI calls and asm! blocks

That's essentially exactly what I meant when I said "external".
Alternatively, you could take "external" to mean "everything that falls outside of the scope of the Rust abstract machine".

Yeah, it could spawn a thread, but similarly to the example above, if we did not pass a reference to the shared memory (explicitly or using a global), then the compiler theoretically may assume that this memory can not be accessed in this potentially spawned thread.

No, because the pointer to the shared memory entered the Rust code via the FFI in the first place. Lacking any further information about this pointer, the compiler has to assume that the memory may be aliased by code behind the FFI -- which is actually a rather standard situation.

However, accessing memory through the pointer, or creating references from it, allows the compiler to rule out all aliasing that would cause UB. (Which is why I specifically mentioned references held across the FFI call.)

Spelling out why IPC is correct fully formally would be an interesting but challenging topic for a PhD thesis. :wink: I don't think it's been done before.

Spelling it out informally would still take a while. And there's more than one way you could try to do it. The high-level summary for the approach I would propose is that you have to axiomatically describe what all the opaque operations (FFI calls and asm blocks) do, in a way that (a) everything in that axiomatic description could be done by regular Rust code, and (b) the description behaves identically (insofar as it can be observed inside the Abstract Machine) to what the FFI/asm code actually does.

Taking the view of just one of the processes, I would say: when you set up the shared memory region, you logically set up an axiomatic thread in your current AM that performs reads and writes mirroring what the other process does in that memory region. When you call recvmsg, that axiomatically performs a release-acquire pair with that axiomatic thread that sent the message, establishing a happens-before relationship. So assuming the other process no longer accesses that memory, you can now do non-atomic accesses and will see the data that the other thread wrote. (So an explicit fence is not required, it can axiomatically happen as part of the opaque FFI call.) When you then call sendmsg, that is an axiomatic release operation and whoever acquires that will happen-after all the non-atomic accesses you did in-between.

1 Like