As much as I hate this, most C compilers (including Clang/LLVM) allow caching of "volatile" variables. This means that whatever you declare as "volatile" will easily end up in your L1 cache. And when you re-read it, the underlying HW will happily give you the value from L1, even though it might have changed in the external memory already (if it's a memory-mapped register, or another core writing to it, or a DMA controller, etc). The only thing that can help you in this case is a hardware cache coherency machinery, but many embedded systems don't have it, and even if they have it - it may not cover DMA and memory-mapped periphery.
Going back to Rust. What I have is a processor with L1 data cache, a DMA controller, and some external memory. The DMA controller does not go through processor's L1 cache. DMA descriptors and some other auxiliary DMA data structures are located in the external memory - written by the processor, read by the DMA controller, and sometimes written by the DMA controller too.
The ISA of my processor has uncached load and uncached store instructions, and also allows me to flush and invalidate cache lines individually. The question I have is how to do this in Rust reliably. Reliability is key, I know many ugly ways to do this in C, and most of those ways sooner or later caused me and my colleagues and my clients some real-world cache coherency issues which took MONTHS to debug.
I would appreciate if someone could share the approaches you are using to make it reliable. Here are my thoughts:
-
The problem with flushing/invalidating cache lines is that if data structures are small enough to share the same line, the processor can change one of them while the DMA controller can change the other, and data corruption is inevitable. This would only work for data structures whose size is a multiple of a cache line size, OR for data structures which are properly aligned and padded to ensure that nothing else can end up in the same line (but it won't work for arrays with layout is dictated by HW). If these constraints can be satisfied, I can create something similar to read_volatile() and write_volatile() functions from core::ptr. But can I guarantee the constraints at compile time? I don't know.
-
I can use inline assembly to perform word-sized uncached loads and stores, and read/write larger structures word-by-word (although this is slow in terms of HW speed, but at least would work with uncached arrays). But I need to ensure that such variables are not accidentally cached, because this will also cause data corruption (e.g. if u32 at address 0x0 is accessed as uncached, and i32 at address 0x4 is accessed as usual, and you access the i32 first, both addresses 0x0 and 0x4 get into L1 because they belong to the same cache line; later you can do an uncached write of 0xdeadbeef to u32, so that its copy in L1 is not coherent anymore, and finally you can do a normal write to i32, making the line dirty. It is likely that sooner or later this dirty line will get evicted, overwriting your 0xdeadbeef with the initial value of u32). So perhaps it can be solved by putting all uncached variables in a separate linker section, but what if I forget to do so? This needs to be detectable at compile time too. And what if I access some of the variables in this section the usual way, so they get cached, and I'm back to square one?
I need a reliable solution for at least Item 2, and ideally for both 1 and 2. For now my only option is to disable D-cache
Thanks