I am curious about How unsafe is allowed in unsafe code, which rules are relaxed by the optimized compiler and which still there.
For example, what about multi *mut pointers to same memory?
Will the following assumptions will possible happen or never happen?
for code like this:
fn inc_2_and_get(n: &mut i32) -> i32 {
let p1 = n as *mut i32;
let p2 = n as *mut i32;
unsafe {
*p1 += 1;
*p2 += 1;
}
*n
}
fn test() -> i32 {
let mut n = 9;
let n = inc_2_and_get(&mut n);
if n == 0 {
0
} else {
n
}
}
Will rust say: "two modifying via two mut pointer, OK, UB". then happily compiles the code like this?
unsafe only means that avoiding undefined behavior is on the programmer; it does not change anything about the fact that Rust programs must never cause undefined behavior.
It is ok, but if it still optimizes unsafe codes extremely, then programmer can't avoid undefined behaviors because the avoiding is built upon compiler does not do unexpected things to affect unsafe blocks.
Assuming you're interested in pointers here, read through blog posts from Ralfj, Ralf's Ramblings , on UB, pointers, stacked borrows (or tree borrows introduced lately by Neven Villani), Miri, etc.
For general consensus of unsafe Rust, these documentations are necessary:
Rules are not relaxed in unsafe code. There's two set of rules, borrow checking and undefined behavior. In unsafe code blocks, compiler still do borrow checking, and it's your responsibility to not invoke UB.
The only thing unsafe does is allow you to write certain specific operations (e.g. dereferencing a raw pointer) that the compiler would otherwise reject unconditionally. It doesn't change the rules about what is UB; it only adds some dangerous language features.
Rust guarantees that code written only in safe Rust is sound (will not perform UB). When you use unsafe in Rust, you've unlocked additional features which allow you to write code that might be unsound (will perform UB). You still must not write unsound code, if you want a program that reliably does what you meant; it's just that the compiler is no longer helping you not write unsound code.
Here's a set diagram illustrating how unsafe fits into the larger world of “writing sound programs”:
When you use only safe Rust code, you receive a guarantee that your program is sound (won't execute UB). You can also get this guarantee by writing programs in traditional memory-safe languages like JavaScript, Python, Java, Ruby, … which all use a garbage collector or reference counting to ensure no use-after-free UB, and have potentially-less-efficient behaviors around concurrency to ensure that no data-race UB happens (or don't have threads with shared memory at all). The point of Rust's design is to allow you to get all three of
ability to write programs in confidence they will be sound even if you make a mistake (in the safe subset)
efficient execution without memory management overhead (via borrow checking)
the ability to step outside of the safe subset (like writing a C extension to Python) without also stepping outside of the entire language
If there are 2 threads executing the code , like this:
unsafe {
let x = &mut *pointer;
lock;
*x = *x + 1;
unlock;
}
It will definitely break the only 1 mut reference rule.
Will it cause UB?
I am sure no data race exists even in a more danerous language like c. While, Rust declare only one mut ref rule, and breaking the rule is UB, all I know is it should not been broken.
That fails me to predict what the code above will cause though I can predict even in C
Firstly, Rust style is to use a Mutex<ptr> instead of a Mutex<()> to prevent that exact mistake.
Secondly, you can easily fix the UB in that sample: move the entire unsafe block into the lock.
// XXX: Don't use a Mutex<()>, actually put the protected data inside it.
static MUTEX: Mutex<()> = Mutex::new();
let _guard = MUTEX.lock();
// SAFETY: uniqueness is enforced by _guard, validity is guaranteed by the type invariant
let x = unsafe { &mut *pointer };
*x = *x + 1;
drop(x);
drop(_guard);
Undefined behavior isn't something that is caused, it exists as a property. Having two mutable references to an object simultaneously is undefined behavior without exception.
Then what is the UB exactly, can any visible bad things be observed, for this particular example?
We need know more than "it is UB, you can't do it". In fact, the behavior may be defined well by introducing a lock to isolate the 2 modifying from operating at same time.
There is a famous list_head struct defined in linux kernel source, as this:
If the rule can't been broken at anytime, then it is impossible for rust to implements it.
Anyway, at any time, there are at least 2 pointers referencing the same memory when you do a writing operation such as list_del/list_add etc.
To answer the original question: you can have multiple mutable pointers that point to the same data, and you can use them to mutate that data, granted that you don't create data races. Note though that creating mutable references to the pointee may or may not invalidate those mutable pointers, depending on how you create those references. The rules are still experimental, but you should checkout the Stacked Borrows and Tree Borrowed models.
Your example in particular is ok.
Yes, the compiler is allowed to assume that there's no alising mutable references with x and so it may reorder the instructions to do *x = *x + 1 before the lock is taken, breaking your code. In pratice though this optimization probably won't happen, so it would be hard to show an example of some code this breaks. However it is UB and you should not rely on it working.
UB means that your program behaviour is not defined. Since compilers are allowed to transform programs as long as they preserve the defined behaviour, and your program doesn't have one, they are allowed to change your program into anything. This may or may not be produce visible consequences, and the result could visibly change with newer version of compilers. You basically have no guarantees of what could happen.
It's definitely tricky to implement something like this, but not impossible. The important part is to guarantee you never take a mutable reference of the list_head which would invalidate the other pointers to it.
This is fine, the problems arise only when you create mutable references or create a data race.
IMO the difficulty here comes from the non-well defined (yet) rules of when creating a (mutable) reference invalidate a raw pointer, and the fact that mutable references can be automatically created in surprising ways (e.g. if you do *x = *x + 1 where x is a raw pointer then this creates a mutable reference!)
No, you don't. It's exactly as simple as "it is UB, you can't do it".
No, it can't. The mere existence of aliasing mutable references is UB, definitionally. It can't be "fixed" by locking. Locking can fix race conditions, but it can't fix the fact that the language considers aliasing &muts to be UB, regardless of data races or anything else.
So, for reference aliasing rules, the data racing is the only thing I should be careful of. I can throw away all these rules, given I always succeed to avoid data racing, by do my protection and forbid the compiler to get ride of the protection based its belief of things like only one mut reference?
I think what you said is perfect correct, while, unfortunately means little in practice.
There are chances you have to dance with the compiler, knows everything it will do, instead of be a obedient good student by doing nothing the teacher forbids
Interesting question. In short, no, there is not necessarily anything visibly bad. But there might be.
The compiler assumes you do not write code that could possibly trigger undefined behaviour. That would be a bug, and you don't write bugs, right?
Given that assumption the compiler may generate code that does not work as expected if you do happen write code that could trigger UB. But that is only "may", it may also just happen to generate code that never shows any visible signs of misbehaving.
Except then, misbehaviour may show up in the future with a new compiler version or compiling for a different target.
Yes, you could get to know exactly what your particular version of your compiler does for a your particular architecture on a given day. With that knowledge you could write code that works even if it does break the teachers/compiler rules about UB or whatever.
However what happens in the future when a new compiler versions introduces optimisations or compiles things differently and then your UB shows up a failure? What if one wants to compile the code for a different architecture, the different code generation could show up your UB as a failure.
I don't think this is what we want to do.
Over the years I have had to deal with a lot of C code that suffered from this "dancing with the devil". The code may have worked for years but when a different compiler is used or a change of processor is attempted all kind of things go wrong. Even things as simple as moving from a 32 bit Intel to 64 bit.
It's not about "the compiler", it's about the language definition. There are many compilers, for now we have different Rust compilers at different versions or for different architectures. In the future there may be many Rust compilers, for example GCC is getting to support Rust.
No, it doesn't "mean little in practice". This is an inclrrect and very dangerous mental model. If you write code with UB, your programs will break, and they will break kn subtle, mysterious, hard-to-detect, and potentially very dangerous ways. Please don't do that.
Or if you do it, let me know what code you wrote so that I can make sure to avoid it.
Not sure what you mean by "God language". But I'm sure it is best to assume the language definition is correct. As we do in all programming languages. Otherwise we would be working in a swamp of the unknown.
Sure you may find bugs in the compiler, or a particular compiler, that produce wrong code. That would be a compiler bug, report it, it will likely get fixed.
Sure you may find things that are not totally, rigorously, unambiguously defined in the language specification. As far as I understand there is no such formal Rust specification yet but it is being worked on. I suggest these are things that should be reported to the language designers.