How can I execute hex opcodes directly in Rust?

How can I execute hex opcodes directly in Rust from an array of bytes?

1 Like

I unfortunately don't have a direct answer, but some thoughts and conjecture, so please take it as such :slight_smile:

Presently in nightly, there is an asm! macro (tracking issue), but it seems to assume that you're writing text format instructions, e.g. mov rather than the hex code, and this usually means that you'll have multiple implementations of the function you're building to target different architectures.

Are you looking to execute arbitrary hex codes at runtime? The asm! macro, I believe, expects a 'static string template as opposed to a string at runtime.

Corollary, if you're trying to execute arbitrary hex codes at runtime, you might have to do some dangerous and arcane writing of memory; you could write the actual byte representations of compiled machine code to an arbitrary location in memory, taking extreme care and caution, and jmp to that address.

To do this, you'd need to be intimately familiar with how the stack for your operating system/architecture works and you'd need to be extremely careful. The more I think about this, the less I think that it can be done safely.

It should go without saying that this is dangerous AF, but that's just the nature of what it seems you're describing doing.


EDIT: As soon as someone builds an assembler in Rust, it would probably be possible to generate machine code at runtime, write it to an arbitrary memory location, set your return address, and jmp away. It would be really cool and horribly frightening all at the same time :metal:

Direct execution is pretty much the antithesis of Rust – how would rustc validate that the resultant code meets Rust's unique vs. shared borrowing and lifetime constraints? Thus you'll have to use unsafe. In which case you can do it all in C/++ through an FFI interface.

Yeah, at this point, you might want to just declare everything unsafe:

unsafe fn main() { ... }

I don't think you need to go quite that far (though the end result is essentially the same). If you're writing a JIT compiler or some such, with the target machine code for a specific architecture with specific extensions, you could keep most of your code in Rust. However the moment you transfer control to such generated "code" you have entered the world of total unsafety.

If you are new to Rust, have you considered defining a repr(C) enum whose variants are assigned the desired hex codes and then interpreting them via match?

I've actually made a lot of progress so far. Here's the part of my code that matters:

const MAX_BYTES: usize = 10000;
const bytes: [u8; MAX_BYTES] = [0; MAX_BYTES];
const address: *const u8 = &bytes as *const u8;
const executable_data: extern "C" fn() = unsafe {
    std::mem::transmute(address)
};

This, however, gives me this error:

error[E0080]: it is undefined behavior to use this value
  --> src/main.rs:9:1
   |
9  | / const executable_data: extern "C" fn() = unsafe {
10 | |     std::mem::transmute(address)
11 | | };
   | |__^ type validation failed: encountered a pointer, but expected a function pointer
   |
   = note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rust compiler repository if you believe it should not be considered undefined behavior

I found myself wondering about this kind of problem today after reading how this guy is booting directly into Rust on a Raspberry Pi. https://www.raspberrypi.org/forums/viewtopic.php?f=72&t=247496

A useful next step would be to enable that kernel to download a Rust binary over a serial link and run it. That would save a lot of messing around rebuilding ones entire program + kernel and reflashing SD cards.

This kind of thing is easy with C/C++ and GCC, just needs some linker script and objdump magic to extract the raw executable bytes for the download. The kernel/bootloader would just download the bytes and jump to the start address.

Presumably there is a way to do this when building a Rust program with rustc.

As for the safety of the thing, well, one will have built and tested the program one wants to download on a PC first, it will not require any unsafe sections itself, except whatever it uses to contact the outside world. That could be pretty robust.

I'd also like to know how to build such raw binaries for my RISC V core in FPGA.

Hmm... is that possibly because you're using compile time constants? I don't know if this will work, but this compiles for me without error:

const MAX_BYTES: usize = 10000;
static BYTES: [u8; MAX_BYTES] = [0; MAX_BYTES];

fn main() {
    let _executable_data: extern "C" fn() = unsafe {
        std::mem::transmute(&BYTES as *const u8)
    };
}

Playground link.

1 Like

Have you considered writing raw asm files, then compiling them and linking to them via build.rs? It would avoid any issues with the Rust calling convention, and undefined behavior within Rust if you interact with the raw ASM purely through a C api.

I believe https://github.com/briansmith/ring uses this strategy for some of the crypto operations which need to be constant-time, but I don't know the details of how that works.


With that said, @chrisd's version of your snippet should work, assuming that you write correct bytecode to implement the C calling convention for the platform you're targeting. You might run into issues with alignment or OS security features - I don't know enough about those to say whether you would.

For safety, I would highly recommend wrapping it in a target-specific cfg block.

fn do_thing() {
    #[cfg(all(target_arch = "x86_64", target_os = "linux"))]
    {
        // call into asm code
    }
    #[cfg(not(all(target_arch = "x86_64", target_os = "linux")))]
    {
        // panic or fallback
    }
}

In const it's complaining because it will try to turn those bytes into a C-function-ptr at compile time, and the compiler assumes that isn't even allowed (since it very well might not be, even with a correct function).

5 Likes

On most newer platforms, you have to explicitly signal to the OS to mark your pages of memory executable and writable in your page table. Otherwise, it will not work. Rust's standard library offers nothing to do that with, but you can always just use libc (or, I assume, winapi).

Here's an example of how that can be done under Linux, by using the mmap call to fetch an executable page. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b2255ceb2a5cdaa873ebc62bbde6ab9d

5 Likes

That's great.

Of course on a bare metal Pi or my RISC V in FPGA there are no such problems with execute/write permissions.

Here is how you can do it in a const fashion:

#[cfg(target_os = "linux")]
fn hello_world () -> !
{
    mod namespaced {
        #[link(section = ".text")]
        #[no_mangle]
        static hello_world: [u8; 38] =
            *b"\x6a\x01\x5f\x89\xf8\x6a\x0d\x5a\xeb\x0a\x5e\x0f\x05\x6a\x3c\x58\xff\xcf\x0f\x05\xe8\xf1\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x0a"
        ;
    }
    extern "C" {
        fn hello_world () -> !;
    }
    unsafe {
        hello_world()
    }
}
  • (The byte code calls write(1, "Hello, World!\n", 13) and then exit(0) so that it effectively never returns)

  • Playground

5 Likes

A cloned version of @Yandros :slight_smile:

use std::mem;

#[no_mangle]
#[link_section = ".text"]
static hello_world: [u8; 38] =
    *b"\x6a\x01\x5f\x89\xf8\x6a\x0d\x5a\xeb\x0a\x5e\x0f\x05\x6a\x3c\x58\xff\xcf\x0f\x05\xe8\xf1\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x0a";

fn main() {
	let exec_data: extern "C" fn () -> ! = unsafe {  mem::transmute(&hello_world as *const _ as *const ()) };
	exec_data();
}

The shellcode is actually:

0x0         6a 01                 push 0x1
0x2         5f                    pop rdi
0x3         89 f8                 mov eax, edi
0x5         6a 0d                 push 0xd
0x7         5a                    pop rdx
0x8         eb 0a                 jmp get_data

print_and_exit:
0xa         5e                    pop rsi    ; "Hello world!"
0xb         0f 05                 syscall    ; sys_write
0xd         6a 3c                 push 0x3c
0xf         58                    pop rax
0x10        ff cf                 dec edi
0x12        0f 05                 syscall    ; sys_exit

get_data:
0x14        e8 f1 ff ff ff        call print_and_exit
0x19        "Hello world!"

Playground

I still do not understand why #[link_section = ".text"] works but #[link(section = ".text")] didn't :frowning:

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.