How can I execute hex opcodes directly in Rust from an array of bytes?
I unfortunately don't have a direct answer, but some thoughts and conjecture, so please take it as such
Presently in nightly, there is an asm!
macro (tracking issue), but it seems to assume that you're writing text format instructions, e.g. mov
rather than the hex code, and this usually means that you'll have multiple implementations of the function you're building to target different architectures.
Are you looking to execute arbitrary hex codes at runtime? The asm!
macro, I believe, expects a 'static
string template as opposed to a string at runtime.
Corollary, if you're trying to execute arbitrary hex codes at runtime, you might have to do some dangerous and arcane writing of memory; you could write the actual byte representations of compiled machine code to an arbitrary location in memory, taking extreme care and caution, and jmp
to that address.
To do this, you'd need to be intimately familiar with how the stack for your operating system/architecture works and you'd need to be extremely careful. The more I think about this, the less I think that it can be done safely.
It should go without saying that this is dangerous AF, but that's just the nature of what it seems you're describing doing.
EDIT: As soon as someone builds an assembler in Rust, it would probably be possible to generate machine code at runtime, write it to an arbitrary memory location, set your return address, and jmp
away. It would be really cool and horribly frightening all at the same time
Direct execution is pretty much the antithesis of Rust – how would rustc validate that the resultant code meets Rust's unique vs. shared borrowing and lifetime constraints? Thus you'll have to use unsafe
. In which case you can do it all in C/++ through an FFI interface.
Yeah, at this point, you might want to just declare everything unsafe:
unsafe fn main() { ... }
I don't think you need to go quite that far (though the end result is essentially the same). If you're writing a JIT compiler or some such, with the target machine code for a specific architecture with specific extensions, you could keep most of your code in Rust. However the moment you transfer control to such generated "code" you have entered the world of total unsafety.
If you are new to Rust, have you considered defining a repr(C)
enum
whose variants are assigned the desired hex codes and then interpreting them via match
?
I've actually made a lot of progress so far. Here's the part of my code that matters:
const MAX_BYTES: usize = 10000;
const bytes: [u8; MAX_BYTES] = [0; MAX_BYTES];
const address: *const u8 = &bytes as *const u8;
const executable_data: extern "C" fn() = unsafe {
std::mem::transmute(address)
};
This, however, gives me this error:
error[E0080]: it is undefined behavior to use this value
--> src/main.rs:9:1
|
9 | / const executable_data: extern "C" fn() = unsafe {
10 | | std::mem::transmute(address)
11 | | };
| |__^ type validation failed: encountered a pointer, but expected a function pointer
|
= note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rust compiler repository if you believe it should not be considered undefined behavior
I found myself wondering about this kind of problem today after reading how this guy is booting directly into Rust on a Raspberry Pi. Announce: RusPiRo - a kernel the Rust way ;) - Raspberry Pi Forums
A useful next step would be to enable that kernel to download a Rust binary over a serial link and run it. That would save a lot of messing around rebuilding ones entire program + kernel and reflashing SD cards.
This kind of thing is easy with C/C++ and GCC, just needs some linker script and objdump magic to extract the raw executable bytes for the download. The kernel/bootloader would just download the bytes and jump to the start address.
Presumably there is a way to do this when building a Rust program with rustc.
As for the safety of the thing, well, one will have built and tested the program one wants to download on a PC first, it will not require any unsafe sections itself, except whatever it uses to contact the outside world. That could be pretty robust.
I'd also like to know how to build such raw binaries for my RISC V core in FPGA.
Hmm... is that possibly because you're using compile time constants? I don't know if this will work, but this compiles for me without error:
const MAX_BYTES: usize = 10000;
static BYTES: [u8; MAX_BYTES] = [0; MAX_BYTES];
fn main() {
let _executable_data: extern "C" fn() = unsafe {
std::mem::transmute(&BYTES as *const u8)
};
}
Have you considered writing raw asm files, then compiling them and linking to them via build.rs
? It would avoid any issues with the Rust calling convention, and undefined behavior within Rust if you interact with the raw ASM purely through a C api.
I believe https://github.com/briansmith/ring uses this strategy for some of the crypto operations which need to be constant-time, but I don't know the details of how that works.
With that said, @chrisd's version of your snippet should work, assuming that you write correct bytecode to implement the C calling convention for the platform you're targeting. You might run into issues with alignment or OS security features - I don't know enough about those to say whether you would.
For safety, I would highly recommend wrapping it in a target-specific cfg block.
fn do_thing() {
#[cfg(all(target_arch = "x86_64", target_os = "linux"))]
{
// call into asm code
}
#[cfg(not(all(target_arch = "x86_64", target_os = "linux")))]
{
// panic or fallback
}
}
In const
it's complaining because it will try to turn those bytes into a C-function-ptr at compile time, and the compiler assumes that isn't even allowed (since it very well might not be, even with a correct function).
On most newer platforms, you have to explicitly signal to the OS to mark your pages of memory executable and writable in your page table. Otherwise, it will not work. Rust's standard library offers nothing to do that with, but you can always just use libc
(or, I assume, winapi
).
Here's an example of how that can be done under Linux, by using the mmap call to fetch an executable page. Rust Playground
That's great.
Of course on a bare metal Pi or my RISC V in FPGA there are no such problems with execute/write permissions.
Here is how you can do it in a const
fashion:
#[cfg(target_os = "linux")]
fn hello_world () -> !
{
mod namespaced {
// #[link(section = ".text")]
#[link_section = ".text"] /* EDIT */
#[no_mangle]
static hello_world: [u8; 38] =
*b"\x6a\x01\x5f\x89\xf8\x6a\x0d\x5a\xeb\x0a\x5e\x0f\x05\x6a\x3c\x58\xff\xcf\x0f\x05\xe8\xf1\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x0a"
;
}
extern "C" {
fn hello_world () -> !;
}
unsafe {
hello_world()
}
}
-
(The byte code calls
write(1, "Hello, World!\n", 13)
and thenexit(0)
so that it effectively never returns) -
Playground (EDIT: fix the link section annotation as show in this thread)
A cloned version of @Yandros
use std::mem;
#[no_mangle]
#[link_section = ".text"]
static hello_world: [u8; 38] =
*b"\x6a\x01\x5f\x89\xf8\x6a\x0d\x5a\xeb\x0a\x5e\x0f\x05\x6a\x3c\x58\xff\xcf\x0f\x05\xe8\xf1\xff\xff\xff\x48\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64\x21\x0a";
fn main() {
let exec_data: extern "C" fn () -> ! = unsafe { mem::transmute(&hello_world as *const _ as *const ()) };
exec_data();
}
The shellcode is actually:
0x0 6a 01 push 0x1
0x2 5f pop rdi
0x3 89 f8 mov eax, edi
0x5 6a 0d push 0xd
0x7 5a pop rdx
0x8 eb 0a jmp get_data
print_and_exit:
0xa 5e pop rsi ; "Hello world!"
0xb 0f 05 syscall ; sys_write
0xd 6a 3c push 0x3c
0xf 58 pop rax
0x10 ff cf dec edi
0x12 0f 05 syscall ; sys_exit
get_data:
0x14 e8 f1 ff ff ff call print_and_exit
0x19 "Hello world!"
I still do not understand why #[link_section = ".text"]
works but #[link(section = ".text")]
didn't
This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.