Uninitialized memory varying from read to read

Would have linked it if I wasn't on a phone: Fastest way of communication between threads of same program? - #66 by simonbuchan

Until you upgrade your compiler. Or you change something in a completely unrelated piece of the program, but that affects the compiler's inlining heuristics and now some obscure optimization takes place and your code goes wrong. Or even your user upgrades their OS. Relying on Undefined Behavior is how we get bugs, and worse, vulnerabilities. Yes, it is done with C because it is impossible for a human to not. But this is exactly what we try to do better with Rust, the very reason for its existence.

No, it's exactly the opposite. If we go with the stronger model, every load and store mush be synchronized and the weaker model is harmed. If, on the other hand, we decide on the weaker model, both will be maximally performant. Yes, it will require the programmer to mark the places the program requires synchronization, but more programmer work for more performant code is exactly the kind of trade-off Rust wants. You can always work with SeqCst atomics for all operations if you want (I don't think this is free even on x86), but Rust will always give you the ability to choose.

1 Like

Finally actually tested this: loads and stores always use unlocked mov, except for SeqCst which uses the always locked xchg. Otherwise everything seems to always use locked instructions: Compiler Explorer - not sure why even a Relaxed cmpxchg is locked - presumably otherwise each core could exchange their local cache's "successfully", so it's not atomic?

1 Like

https://stackoverflow.com/questions/24234769/why-does-memory-order-relaxed-use-atomic-lock-prefixed-instructions-on-x86

1 Like

The MESI negotiations before that will bring the correct cache lines in and get exclusive write before it attempts the op. (This is what LOCK means now most of the time).

I'm not sure what you're saying here. Are you claiming that cmpxchg or add or whatever without a lock somehow get MESI negotiated? Perhaps you missed the "otherwise"?

I'm saying the the scenario you are describing, if I understand what you are saying, doesn't happen. The cache line will be brought in the the L1 as exclusive so there no other core would be trying to write to it. LOCK = MESI for the most part in the last decade. Atomics are atomic because of the MESI negotiations for them. The idea of asserting a control line doesn't happen (I've seen one comment about that happening in the lat few years and it dealt with a cache split unaligned access after the split buffers were all occupied).

And yes, everything gets MESI negotiated (everything that can be cached). Thats kind of why aligned access is atomic on Intel. Once cache is your source of truth (over main memory) and you have to keep it consistent everything atomic seems like a natural endpoint)

Not in this case. In this case the arch with the stronger model is also the faster one and the one that handles 99% of the performance orited workloads between the two. In this case, the weaker model machine is slower and if doesn't deal in workloads that have such fierce performance requirements. The precise reason I would like a change a long these lines, and others like me, is because it would allow x64 machines to use what they have more naturally.

And if you want to rewrite all packages that would be used too. Also, I'm not so sure that you aren't giving up any optimization opportunities by doing that and might be confusing the compiler in other optimization chances. You can hack your way to something that is a local solution for just your code, but it isn't sustainable.

That's actually a really intersting idea to make a new target platform. It should like something I'd spend too much time one weekend trying out even.

If there was a platform where all the alignment for 32-bit and under data was 1 byte, could Rust handle that?

Just read that Rust will repack your structs to reduce gaps. Ugh. Now I have to go read up on that - sounds a little scary repacking your flag onto a cache line with a spin lock, or when you designed network protocols, you want the iimpotant stuff on on first cache line for filtering so you dont decode more han needed... It's not a horrible decision to do that. would be nice to turn it off field by field.

I would be more cautious claiming that. Even today there is performance-sensitive ARM code, and ARM may be much more common in the future. But it doesn't matter at all what is more common: if there was a contradiction, i.e. being fast on one machine would mean being slow on the other, your argument would hold. But as long as being fast on ARM requiring being more careful on coding - but doesn't impact x86's performance, this is not a trade-off Rust is willing to make. Like I said, Rust will always allow you to do the fastest thing, because Rust priotorizes control over development easiness. Other languages make other trade-offs, and you're welcome to make your own language that does what you want. I don't know an existing language that does that (doesn't mean there isn't, obviously); I suspect degrading ARM's performance is not something very wise to do.

No, you just benefit from them not using atomics on weak memory model platforms without additional work (as long as they don't contain bugs in their treatment of atomics).

Then you can use Relaxed.

I don't really know how we went from uninitialized UB to discussions about synchronization, however: treating uninitialized memory as UB, more than it is done because "this is what the hardware does", it is done for the sake of compiler optimizations.

I think most people would agree with this statement. May I suggest that you review your own posts with it in mind? I’ve been watching this discussion from the sidelines, and neither side has given an inch in their respective positions.

Other than arguing for the fun of it, I’m struggling to understand what you expect to gain from this conversation. You clearly disagree with the way Rust deals with low-level architecture details and UB, but none of the people you’re talking to can do anything about it— Those people hang out in IRLO instead of here.

10 Likes

See, not reading uninit memory, since most compilers like to define that particular UB. However I can write code using uninit that on my own compiler would look very much like unreachable_unchecked.

In rust, the code in question would be

let x = unsafe{core::mem::uninitialized()};
if x{}

Incidentally, llvm likes it just as much.

Technically, in LLVM reading uninitialized memory does not produce UB but undef.

Yep. llvm also really likes speculating reads.

It could be poison but a) I don't think there was poison back then and b) I think it doesn't match C semantics.

(Sorry for not reading the entire thread. I got pinged in the middle of it and this is longer than what I have time to read entirely right now -- so either I react based on a quick skim or I ignore the ping entirely...)

As far as I am concerned, this is what the comment means. Some people like to think of the Rust Abstract Machine as the source of truth for execution behavior (and you seem to be among them), others think of it as just describing what happens on the "real" execution of an actual CPU -- and that is the mindset which the documentation targets here.

Is that true? My impression was that this can only happen for memory which is truly uninitialized. Otherwise this seems impossible to program against even in C.

My comment was certainly written under the assumption that only never-written-to memory can lose its data, and that unobservable events like memory pressure swapping a page to disk do not affect what the program can see.

I think as used by jemalloc it's only observable if you use unsafe code to read memory that has been freed, but yes in general the documentation seems to say that reads of pages marked MADV_FREE will return whatever was written there previously until memory pressure occurs at which point they will suddenly return 0. Any write that happens between marking the pages MADV_FREE and memory pressure actually occurring cancels the operation.

I haven't looked at the jemalloc code but my guess is this is used as a way to speed up memory use while still giving pages back to the OS. When the user frees memory the pages can go into a pool, but are also marked MADV_FREE. This way if there is no memory pressure the pages can get reused, but if there is memory pressure the pool code doesn't need to detect and move the pages out of the pool, they will just get transparently reallocated from the OS the next time they are written to. But the OS will only zero the pages when that transparent allocation happens in response to memory pressure and a subsequent write, not when you first use MADV_FREE on the pages.

This is also "safe" in C if the user has no memory safety bugs so they never read freed memory. But if you do you get a new fun source of nondeterminism where now your hard to debug memory safety issues become super duper hard to debug because reproduction now depends on memory pressure :grimacing:

This optimization only makes sense when applied to pages that have been written to before. If the page has never been written to ("truly uninitialized") then Linux won't have allocated it at all. Before a write happens all readable pages are mapped to the same all zero page so no memory use actually occurs, then they get lazily allocated on write.

So in summary, I think using MADV_FREE for anything other than the narrow use case of speeding up allocators is probably unsound, and in that context it is only ever used on memory that is free from the POV of the allocator. Since you're not supposed to read freed memory anyway, I don't think MADV_FREE should be considered when thinking about uninitialized memory semantics.

Doc:

MADV_FREE (since Linux 4.5)
The application no longer requires the pages in the range
specified by addr and len. The kernel can thus free these
pages, but the freeing could be delayed until memory
pressure occurs
. For each of the pages that has been
marked to be freed but has not yet been freed, the free
operation will be canceled if the caller writes into the
page. After a successful MADV_FREE operation, any stale
data (i.e., dirty, unwritten pages) will be lost when the
kernel frees the pages
However, subsequent writes to
pages in the range will succeed and then kernel cannot
free those dirtied pages, so that the caller can always
see just written data. If there is no subsequent write,
the kernel can free the pages at any time. Once pages in
the range have been freed, the caller will see zero-fill-
on-demand pages upon subsequent page references.

That's enough though, isn't it?
Of course you cannot just call MADV_FREE on arbitrary data, but you can call it on a memory range that the AM says is entirely uninitialized. So there is a safe way of using this feature.

Some people are not convinced by arguments like "this is UB according to the Rust spec", and insist that we can just always read any data and do whatever we want with it, since it's all just bytes in memory and nothing weird can happen. If compilers do optimizations that lead to "unstable reads", that is taken as another point for compilers being terrible and breaking people's code. The MADV_FREE argument is meant to help convince those people, by showing another possible source of unstable values. The MADV_FREE argument is for people which don't agree that "you're not supposed to read freed [or generally, uninitialized] memory".

It looks like you are already fine accepting the Rust Abstract Machine and its rule for uninit memory as a source of truth, and in that case indeed MADV_FREE doesn't change anything. (Until we talk about adding freeze to the language, which is made tricky if we want to support allocators that use MADV_FREE.)

Nota bene: "reading" uninit memory is actually fine in Rust, as long as you do it at a type like MaybeUninit. There is no rule which says that it is UB to read uninit memory. However, there is a rule which says that reading some memory at a given type T imposes some requirements (e.g., a bool has to be true or false), and almost all (but not all) types impose as part of this that memory must be initialized.

3 Likes

Well… as far as I can tell by this description:

it sounds like the correct model of this situation on the abstract machine is simply that marking a page with MADV_FREE will turn the entire contents of the page into uninitialized memory. Additionally (as an effect not modeled in the abstract machine), this fact that the whole page only contains uninitialized memory is communicated to the OS so that your program won’t waste memory it isn’t using. (After all, you don’t need to store any actual information to represent a chunk of uninitialized memory.) Of course, the need to do this kind of operation of (effectively) freeing large chunks of memory is particularly useful for memory allocators, so there might not be much use outside of this application for that reason alone.

That’s fairly straightforward though, I don’t see how that in particular would be a “very unsafe” operation compared to other unsafe operations. Arguably, it could even be a safe operation, depending on the API you’re giving it; e.g. other operations that turn memory uninitialized are safe, too, such as writing MaybeUninit::uninit() to some &mut MaybeUninit<u8> that might have previously contained some initialized data. And – yes – you can also use an initialized MaybeUninit in safe code, e.g. by using the &mut T reference that MaybeUninit<T>::write returns.

On a second thought, it might actually be a safe operation to simply implement a function that takes &mut [MaybeUninit<u8>] and marks all the pages fully contained withing that slice with MADV_FREE. For all safety purposes, the semantics are essentially the same as writing MaybeUninit::uninit()s to the whole slice.

2 Likes