I want to write very unsafe program in Rust but I can't!

Here is a classic C code that is very dangerous.

#include <stdio.h>

int a = 67305985;
char arr[4] = {0x06, 0x07, 0x08, 0x09};

int main() {
    char *b = arr;
    int c = -2;
    c[b] = 0x05; // c[b] -> *(c + b)

    // what's the value of a?
    printf("%d\n", a);
    return 0;
}

Godbolt

It is dangerous because even though nowhere in main() has a ever appeared, the value of a got changed to some other undefined integer, because of the out of index access of the array.

I want to replicate this code in Rust.

This is my attempt:

static mut a: i32 = 67305985;
static arr: [u8; 4] = [0x06, 0x07, 0x08, 0x09];

fn main(){
    
    let b: *const [u8; 4] = &arr as *const [u8; 4];
    
    let c: i32 = -2;
    
    unsafe {
        *((b as *mut isize).offset(c as isize)) = 0x05;
    }
    
    // what is the value of `a`?
    println!("{}", unsafe {a} as isize);
}

Playground

It compiles but panics.

I just want it to print a, is it possible?

1 Like

The panic is a debug assertion, if you compile in release mode, it runs.

I don't think any of this is guaranteed, though, because the code is very obviously UB on multiple counts.

13 Likes

Thank you, in release mode the program does run.

However, to my shock, the value of a did not change. That says something about how Rust lays out statics in the object file.

Your C program is not guaranteed to produce that result - in fact, it's not guaranteed to produce any result. You can certainly observe that on a specific C implementation and at specific build settings, but it's not a property of the program you're seeing; it's a property of how the implementation handles this specific undefined program.

It would be equally valid for your C program to crash on startup, for the compiler to reject it, or any of a number of other, more esoteric outcomes. One valid translation of your C program both discards the writes to c (since they are never read), optimizes c away (since it is neither written to nor read from after the previous optimization), and optimizes a away entirely (since it is initialized with a constant, and never written to). That would reduce the program down to

#include <stdio.h>

int main() {
  printf("%d\n", 67305985);
  return 0;
}

I was able to produce several translations close to this, though none that are exactly this, in both clang and gcc. However the program is not guaranteed to do this, either: it's an undefined program, so there is no well-specified result.

Nothing in Rust stops you from writing under-defined programs outright. However, safe rust is intended to be free from undefined behaviours, and writing outside of the bounds of an array is an undefined behaviour, so it you can't do that in safe rust.

I just want it to print a, is it possible?

The only value a can legally have in a well-specified program would be 67305985, since there are no writes to a anywhere in your program. The compiler is, under normal circumstances, allowed to rely on that observation to do things like eliminate the variable. However, the out-of-bounds writes to b mean that the program is no longer well-specified, so there is no single correct output. Your expectation - that the program output 5 - is as invalid as the expectation that it output 67305985.

27 Likes

No, it doesn't say anything about Rust, because the program source is meaningless by definition.


if we put language lawyering aside, it's still not about statics or memory layout. The optimizer likely detected that you are modifying something that's not supposed to be mutable, and assumed such erroneous mutation doesn't happen. If the program actually attempted to write a read-only memory segment, then it would have been killed by the OS with an access violation or something.

9 Likes

And in fact it doesn’t produce that result if compiled in GCC with optimization enabled.

2 Likes

Great! This reminds me of the Belgium cartoon Gaston Lagaffe where Gaston invents a machine which can throw and catch a bowling ball. The machine destroys itself during a short moment of clarity.

1 Like

It also produces different results in OP's chosen compiler (MSVC), depending on the optimization level. It outputs 67305985 at -O2, consistent with the compiler noticing that the value must be a constant because it's not written to, but outputs 67437057 with the default options, consistent with a translation that does rote line-by-line access to memory and that lays out the variables in adjacent addresses. Neither is guaranteed by the language.

7 Likes

Yeah, I guess what this concludes is that it is meaningless to attempt to reason what the output of the program will be since its totally undefined in the sense of even if you pinned down the compiler (both C compliers and even the one and only rustc compiler), you are never guaranteed that the observed output will stay consistent in e.g. larger programs, across different versions of the compiler, across different compilers and even across different optimization settings.

1 Like

But guys, let's just say if I want to be a very very naughty programmer - I want to change a by manipulating the bytes of a instead of writing codes that change it in the 'usual' and unsafe way, e.g. unsafe { a = 1234 };, is it possible?

No, not really.
If you let the compiler know you're changing a, you're not really being very very naughty.
And if you don't let the compiler know you're changing a, the compiler is under no obligation to leave a in a predictable location where you can change its bytes.

5 Likes

No. You can't get the address of a (or any other variable) other than from itself. It has a unique location unrelated to any other thing from a language's perspective. The "object file" simply does not exists at this level.

1 Like

Guys, I don't get it. I let the compiler knows that I might be changing a with that mut in static mut, why would it not be under the obligation to put a at some predictable location?

It's under an obligation to put it at a location, and ensure that location is writable, but nothing specifies exactly which location it should be in (either in terms of the exact address or its position relative to other things), and the compiler is free to adjust that for whatever purpose it needs. It's also allowed to see that you aren't actually writing to it (since it isn't being exported) and optimise those things out if it wouldn't change anything in the defined behaviour of the program.

3 Likes

Yes, you can change a by manipulating individual bytes of a:

use std::ptr;

static mut A: i32 = 67305985;

fn main(){
    unsafe {
        let p = ptr::addr_of_mut!(A) as *mut u8;
        *p = 5;
        *p.offset(1) = 0;
        *p.offset(2) = 0;
        *p.offset(3) = 0;
        println!("{}", A);
    }
}

Playground

2 Likes

Thank you for your explanation, it is clear and insightful.

My sincere apologies for not being clear and thank you for your response.

The intention of the C codes at the start of this topic was to illustrate how out of bounds indexing is a security vulnerability as it can manipulate bytes that it is not supposed to.

I want to recreate the same codes in Rust such that when I attempt to change the value of a certain offset from a completely different variable, in this case b, it results in a change in a's value as well.

I would say this is even more of a reason for the compiler not to be predictable, as it would help exploiting potential vulnerabilities.

1 Like

You can force a known layout with repr(C) on a struct, that should be enough to demonstrate out of bounds writes while keeping the compiler happy?

3 Likes

Hi, thanks for your response.

But that struct even though has C repr it is still unlikely to stay in a predictable address for out of index access to mutate it?