I want to read Export Address Table (EAT) of a dll inside another process I am attached as a debugger to.
To do that I receive an event LOAD_DLL_DEBUG_EVENT containing lpBaseOfDll: *mut c_void.
Later on I search for an offset from that pointer to EAT by reading process memory (by looking up IMAGE_NT_HEADER and reading RVA at IMAGE_DIRECTORY_ENTRY, but this is not important to the question).
Having a pointer and offset I need to do pointer arithmetic to build an address, which I will then pass to ReadProcessMemory function, which will read memory inside of the debugee.
But doing pointer arithmetic is unsafe and requires some preconditions. But I am not even working with memory in my own process! Let alone allocated objects.
Is there a way I can safely do that? Maybe by converting pointers to usize?
Here is a pseudo-code to illustrate the problem visually.
enum Event {
LoadDll(*mut std::ptr::c_void),
...
}
...
let Event::LoadDll(base_of_dll) = wait_for_events()? else {
continue;
};
let offset = get_eat_offset(process_handle, base_of_dll)?;
let eat_ptr = unsafe { base_of_dll.byte_offset(offset) }; // Undefined Behavior?
let _eat_ptr = (base_of_dll as usize + offset) as *mut std::ffi::c_void; // And this?
let eat = read_process_memory(process_handle, eat_ptr); // This is its own can of worms with proper alignment, uninitialized memory and dynamic structs
(I am not at all an expert in this, just looking at the documentation, someone please correct this if it is wrong)
The documentation says that
An allocated object is a subset of program memory which is addressable from Rust, and within which pointer arithmetic is possible.
...
An allocated object has a base address, a size, and a set of memory addresses.
It seems like the list of required elements for an "allocated object" is:
Base address
Size
Set of valid memory addresses
Addressable from Rust
If that is all that is required, then the entire process the program is attached to seems to be a valid allocated object, in which case pointer arithmetic would be valid.
As for how to do the pointer arithmetic, it seems like the safest way to do that would be wrapping_byte_offset, but byte_offset should also be fine in this case (it seems like the "wrapping" refers to whether computing the offset overflows isize rather than whether the pointer arithmetic wraps through the address space).
This pointer is not addressable from Rust though! You cannot dereference it in Rust with *, the only thing you can do with it is pass it to WinAPI functions (like ReadProcessMemory, WriteProcessMemory, etc.), which will then do a syscall and the kernel itself will read this memory.
It seems to me that these functions are really only using addresses, not pointers, and should potentially be defined that way as well. I would immediately convert the pointer to a usize and use that, which would have the advantage that the arithmetic would be much easier than using wrapping_byte_offset() everywhere.
Unfortunately the official Windows API Metadata doesn't define the value as anything other than a C void *, so realistically it's probably stuck that way in the Rust API definition.
Yes, thank you! I was also thinking that treating those pointers as addresses (with usize) is safe and makes more sense. But it was still interesting whether working with those pointers is safe and what role pointer provenance playes here (if it even does).
I don't understand enough about what pointer provenance means, though a pointer obtained from an FFI function call might be opaque enough that it doesn't make a difference. In terms of the generated code, .offset() can generate a getelementptr inbounds instruction for LLVM, while .wrapping_offset() just generates getelementptr, so whatever happens is deep in the internals of LLVM's optimisations.
Note that the critical difference is whether the pointer stays within the Rust "allocation" -- the static or let or whatever.
let x = 4_u32;
(&raw const x).byte_offset(4); // ok, going to the past-the-end pointer is defined
(&raw const x).byte_offset(-1); // !!UB!! because it moves outside the `u32`
(&raw const x).byte_offset(5); // !!UB!! because it moves outside the `u32`
(&raw const x).wrapping_byte_offset(42); // ok, because the wrapping lets it go anywhere
// (though it would be UB to read from after that offsetting)