How about we keep the 257th byte internally for rust to mark dead code, but do not assume that uninitialized data has value 256? Instead let's assume it's value is in range 0-255 but is unspecified.
As for you code example, here's it's spec
// precondition: 0 <= x < MAX_INT
// postcondition: true
int set(int x) {
int a;
a = x;
}
// precondition: 0 <= y < MAX_INT
// postcondition: 0 <= \result <= MAX_INT
int add(int y) {
int a;
// assert 0 <= a < MAX_INT
return a + y;
}
int main() {
int sum;
// assert 0 <= sum < MAX_INT
set(2);
// assert 0 <= sum < MAX_INT
sum = add(3);
// assert 0 <= sum < MAX_INT
printf("%d\n", sum); // the spec says nothing about this being 5
}
Compiler is free to do put whatever value it likes in any of the a
's. The axioms of Hoare logic do not allow me to deduce anything interesting about sum
.
I don't see why. The uninitialized variable is not shared between threads. Compiler should not modify the value of uninitialized variable after it has been read, because otherwise that would violate the empty statement axiom of Hoare logic. This value is not shared in any way.
int u; // uninitialized data
int x = 3; // here compiler could optimize by reusing memory location u to write value of x
int y = x + 5;
f(u); // here u is read and from now on compiler cannot reuse u it to save memory or overwrite it in any way
Moreover, any code you could possibly image that has uninitialized data, could just as fine be replaced with a call to a rand()
function or something like this and then compiled "as usual". If you can compile code with rand()
then you can remove rand()
and leave the data uninitialized. The postcondition of rand()
is that the value can be anything (let's also assume we do not call rand()
on things like bool
or fn
or enums, but only use it on datatypes for which all binary representations are valid values, like u32
). Then any proof of correctness that worked with rand()
will also work with uninitialized data.
And there are two reasons why this is important. First is that we are actually using it all the time.
let buffer:*mut u8 = allocate();
let length = unix_read(buffer, "some/file/path"); // system call
print(buffer,length); // oops, formally (according to formalism that uses 257th byte) this is uninitialized, isn't it? How can compiler know what unix_read did?
Second reason is for sake of keeping consistency with some formal reasoning. After all, if you care about optimizations more than logic, then why don't we compile all programs to just
int main(){
return 0;
}
This code is blazingly fast. If we don't care about having decent formal foundations, then we might as well throw out all rules and ignore what programmer writes.
By the way, here
I was broadly referring to code that can be proved correct according to Hoare-like formal system of reasoning. To prove things you need to deduce postconditions based on axioms and preconditions of each statement. The predicate like "was x written to" is meaningless in such proofs.