It occurred to me that, especially in the context of an AVR micro-processor, the #[link_section] attribute as it is not marked as unsafe seams to be actually unsound, because it allows Safe Rust code to create Undefined Behavior.
My reasoning goes like this: On the AVR architecture, we have separate address spaces for program code and data. But numerically, both occupy the same values. Appearently, normal data as well as normal statics all resist in the data space. However, one can easily (that is without any unsafe) annotate an arbitrary static using the #[link_section] attribute, which can cause it to be placed in the program space instead, which is typically a very reasonable thing for AVR.
Now, if not enough care is taken, or one just wants to challenge Rust, the value can be copied out of the static. But apparently Rust is not aware of the fact that said static is in fact not in the data space but in program space, and thus instead of loading the static value correctly from program space, Rust loads 'arbitrary' data from the data space, which just happen to have the same numeric address.
A code example may look like the following:
// Store PROG_BLOB in program space, ".text" would also work
#[link_section = ".progmem"]
static PROG_BLOB: [u8; 128] = [42; 128];
fn main() -> ! {
let mut serial = /* initialize a serial output */;
let mut idx = 0;
loop {
// This access is illegal, because Rust will emit a normal load
// instruction, whereas the data is in the program space,
// requiring a special load instruction.
let b = PROG_BLOB[idx];
// Dumping arbitrary RAM data!
ufmt::uwrite!(&mut serial, "{:?} ", b).void_unwrap();
idx += 1;
if idx == BIG_BLOB.len() {
break
}
}
loop {
// Just loop forever
}
}
I used above code on my Arduino Uno without a single unsafe and yet got random RAM values on the serial output. Sounds like Undefined Behavior, or did I miss something?
Can attributes be unsafe at all?
The only way to workaround this that I can think of would be to write a proc_macro, that you can use instead of #[link_section], which somehow hides the original symbol and instead exposes some kind of getter function to access the data safely.
I agree that using custom link sections is kind of dangerous, but I don't know how Rust itself could help here...
This could also be solved if Rust was aware of separate memory location that have to be accessed differently. Maybe Rust-AVR is already working on that, but I did not check.
The developers of the AVR port of Rust are of course well aware of the fact that AVR is a Harvard architecture; in fact, one of the longest-blocking issues that prevented merging the AVR fork back into the main repository was the fact that rustc didn't handle address spaces correctly.
Sounds like they need to be able to. Maybe items marked with #[link_section] should be marked as unsafe by the compiler (or require an unsafe keyword).
The issue pointed out by OP seems 100% legitimate – it does genuinely cause UB without unsafe, so it is unsound.
#[link_section] and similar linkage-controlling attributes are unsound in general, not just on Harvard architectures. You can put non-zeroable data into .bss, put mutable data in read-only sections, and just cause havoc in general when combining this with a linker script.
Well I did write a crate avr-progmem, which will wrap your .progmem static in an opaque wrapper via macro, only offering getters that load it directly from progmem. But what remains is, that you could as well use #[link_section] on your own potentially causing UB. That is essentially, how I became aware of the issue.
Well, I'm not convinced that #[link_section] causes UB on say x86, because if you put a function in the data segment, or an atomic integer (which is safely mutable) in the code segment, all pointers and Rust operations are actually correct it's just the MMU which will terminate your program since you screwed up. But is quite deterministic and, I would say, expected. Also, the Nomicon specifies the abort of the program NOT as undefined behavior, as you can read at the end of "what unsafe does".
Also notice, if you do put a static into the .bss it dosen't even compile: Playground example
But at least it is safe if one would only use your crate. I think this is a great solution until Rust supports something like this natively.
Maybe you are right, but it might be that you could still introduce UB if you just screw up hard enough . And on ARM-Cortex-M for example you don't even have a MMU sometimes. Then all bets are off.
Yes, there seems to be an LLVM safeguard to prevent .bss from containing non-zero data. But that error is not something rustc guarantees. For example, the Cranelift backend might allow code like that.
It is still unsound and causes immediate undefined behavior to be able to put arbitrary data into read-only sections though, because rustc expects such data to be readable and writable and will optimize under that assumption. There is no guarantee that the effect of this is a program abort, or that the abort will happen when the code writes to the variable.
Ok, that thing with .bss looks fishy, after all, you got a point there.
But about that thing with read-only data, I'm still not completely convinced. Sure, Rust does assume all writable (e.g. atomic) statics to be indeed writable, and may optimize upon that assumption. Which as far as I can think about, means that Rust must assume that after writing to an address all sub-sequential reads will produce that written value (modulo some multi-threading issues, but then it's unsound from the start).
So to the best of my knowledge, if Rust writes to a static (under the assumption it where writable) and it works, it's obviously fine. If it segfaults instead, and Rust can not read it thereafter, the assumption actually still holds, which then is sound. Only, if Rust writes to a static and it fails silently, so the program continues, but a subsequent read produces an old/wrong value, that sounds totally unsound, i.e. Undefined Behavior!
Question is: can this ever happen on x86? I don't think so.
However, I don't know enough about ARM, maybe that could be an unsoundness on ARM.
I don't know a lot about x86, but this could totally happen on ARM and on AVR is will absolutely happen.
In my opinion anything that is potentially unsound in one case should be treated as unsafe.
I still remember one who wrote totally safe #[no_mangle] fn write(...) {...} and makes it_works test broken. In general don't assumes attributes communicate to the linker is safe like safe function. They interactt directly with the underlying linking format like ELF, which definitely not a safe language like Rust.