Copy_nonoverlapping concurrently?

Is it sound to use copy_nonoverlapping::<u8> in multiple threads simultaneously on the same region of memory? Obviously it's liable to return garbage, but will it cause unsoundness?

I don't think memcpy is UB under those circumstances, and given copy_nonoverlapping rustdocs describe themselves in its terms, I'd expect it to also not be UB. But I feel there's enough ambiguity about unsafe Rust that I wouldn't bet any amount of money on it :slight_smile:.

1 Like

I would think it would be the same as any other way you cause a data-race with unsafe code. Best avoided. Are you intending any sync at all?

It's a security thing - I need to be able to handle it because an untrusted process sharing memory with the current process can do arbitrary things. In particular, see this crate.

Sharing among multiple processes is a different ballgame than multiple threads as you first asked, and Rust can't really do anything to protect you there. You basically need to treat it like volatile memory -- maybe wrapping it in UnsafeCell is enough, but I'm not sure.

They're actually more similar than you might expect insofar as it's largely a question of whether LLVM treats the memory as a safe location to store temporary data. You should take a look at the crate I linked to if you're curious for details, but the TLDR is to never access the memory directly, only through raw pointers.

1 Like

OK, yes, I should have looked closer. That roughly supports what I said about inter-process memory being different, but you've extensively worked out how to deal with this. Neat stuff!

Here's to a formal Rust memory model :champagne:

How does one semantically use this buffer given it’s a free for all? Is the intention that something else builds a coherent protocol over this such that the processes actually carefully coordinate access?

I also notice there’s a release fence for writes. But typically releasing stores are paired up with acquiring loads, which I don’t see. Maybe this is some fuchsia thing though.

Yeah, there's a protocol to signal when the buffer is ready to be written or read. The idea with this crate isn't to implement the full protocol, but just to provide enough of a building block that higher-level code can do it without needing to invoke unsafe.

That's actually an omission for which I have a PR out to fix right now :slight_smile:

Ok, I see.

Sorry, I don’t mean for this to be a code review of sorts (but hey, you did link to the code :slight_smile:), but what’s up with that volatile write to a dummy scratch tuple in new()? Is that actually needed? I see the comments there but are they true or speculation?

Also, it seems like checking the incoming buf for null would be prudent and then possibly wrapping it in NonNull.

According to the Google folks who work on LLVM, it's guaranteed to be unnecessary. But it's also hilariously cheap (a single volatile write once when you set up the connection), so I figured it was worth it to hedge. All of this is relying very heavily on details of LLVM's model of what is and isn't observable, so it seemed prudent to hedge against a minor flaw in our reasoning.

That's a good idea! I'll look into doing that.

Yeah, my question isn’t really about the performance implications but rather ... it’s weird code that some may end up cargo culting around :slight_smile:

It’s a bit sad that one has to “hedge” on such things, rather than there being a definitive answer (although you say the LLVM devs guarantee it’s unnecessary?). If nothing is really truly guaranteed, then one can say LLVM can see the write is clearly bogus and elide it (yes, it’s volatile but it can clearly see the dst is bogus).

Anyway, carry on.

The trick is that it being a volatile write means that it can't see that the destination is bogus. For all it knows, that memory location is a memory mapped register that results in a message getting sent to a peripheral device or something. The behavior of volatile reads and writes is actually guaranteed by the LLVM model.

Without the volatile write, you're relying on LLVM not knowing where the memory came from, and thus not knowing whether it's safe to elide writes. That's the thing that the LLVM devs are certain is safe, but again, hedging my bets :slight_smile:

My hope is that most people treat this code as a black box and don't try to understand its internals, but it's certainly a risk :stuck_out_tongue:

The code does a volatile write to a dummy and then proceeds to pass the src buf - I don’t see how the above even comes into play. You’d presumably need to forward the ptr out of the tuple to the SharedBuffer.

Better way to think of this is doing volatile load and stores, not marking some memory location in a field as volatile - at least in Rust, you can’t have a volatile location (AFAIK). So then the question is whether copy_overlapped has volatile load/store semantics. Probably doesn’t because this exists: volatile_copy_nonoverlapping_memory in std::intrinsics - Rust

From a human's point of view, it's obvious that this is meaningless, but LLVM is required according to the definition of a volatile write to treat the value as having escaped.

It's not about marking a location as volatile, but rather about convincing LLVM that writes are observable. E.g., if you non-volatile write to a temporary local variable and then do nothing with it, LLVM can easily prove that the write is never observed, and thus doesn't have to make into the compiled binary. The trick is to make LLVM think that somebody could observe the write later, and so that eliding the write in the compiled binary would result in the binary's observable behavior being different from that of the source code.

This seems bizarre but that’s likely because I don’t know LLVM well enough. So given a ptr stored in some struct, when LLVM sees this field (ptr) at some later point in time, what can it assume without the volatile write thing? I’m having a hard time understanding what type of code would go sideways (potentially) without this. In particular, assume the later loads/stores themselves are volatile, but even without that I’m still curious.

Concretely, consider the following code:

fn foo() -> usize {
    let mut a = 0usize;
    ptr::write((&mut a) as *mut usize, 1);
    1
}

LLVM is required to produce a binary which behaves exactly like the semantics of this source code, which includes writing the value 1 to the pointer derived from a. However, since nobody reads from a after that happens, a program which doesn't actually perform the write will still be semantically equivalent because it will have the same behavior as a program which does perform the write.

However, if we modify the code as follows:

fn foo() -> usize {
    let mut a = 0usize;
    send_over_a_channel((&mut a) as *mut usize);
    ptr::write((&mut a) as *mut usize, 1);
    1
}

then LLVM has to assume that somebody in another thread might read the pointer out of the channel and look at its contents. If it completely elides the write, that could result in a program whose behavior is meaningfully different from the behavior implied by the source code, and so LLVM is not allowed to elide the write.

So what we're trying to do in shared-buffer is to ensure that LLVM can't elide the write as in the first instance of foo. To do that, we trick it into believing that somebody else is looking at the memory.

Right - it has to assume that unless it can reason about the memory not being used, like in your first example. How can it do that though given the value will eventually be written with some sort of volatile (or atomic) operation, presumably?

I guess what seems weird is why new() does this against the ptr, rather than leaving the actual stores/loads to be volatile (and/or atomic). I understand what you’re trying to do, I guess, but new() just seems like an odd location.

May the force be with all people who are/will be involved in specifying unsafe Rust guidelines :slight_smile:

1 Like

Performance. volatile reads and writes severely limit what LLVM can do around optimization. Not only can it not reorder them with respect to one another, it can't even coalesce them. So if you use the read method to read a few bytes many times in a row, LLVM can't optimize that into a single memory read.

Preach.