The weird behavior of raw pointer and alloc on Windows

I have the following Rust code, and I'm unsure why the returned address (raw_block) is invalid:

let m = unsafe { &*(raw_data as *const Matrix<Double>) };
inverse(m).map(|inv| {
    let raw_block = unsafe { alloc_zeroed(Layout::new::<Matrix<Double>>()) };
    unsafe { *(raw_block as *mut Matrix<Double>) = inv };
    (
        Value::RawObject(String::from("Matrix"), raw_block),
        Environment::new(),
    )
})

This issue does not occur on Linux and macOS. Interestingly, if I use inv in the following way:

let m = unsafe { &*(raw_data as *const Matrix<Double>) };
inverse(m).map(|inv| {
    dbg!(&inv); // here
    let raw_block = unsafe { alloc_zeroed(Layout::new::<Matrix<Double>>()) };
    unsafe { *(raw_block as *mut Matrix<Double>) = inv };
    (
        Value::RawObject(String::from("Matrix"), raw_block),
        Environment::new(),
    )
})

the problem disappears, but this introduces a dbg! print statement. Why does this happen, and is there a way to ensure inv is valid without using dbg!?

You can find the specific code in this repository rswk/ksl_matrix.

Steps to reproduce:

  1. Clone the repository to your local machine: git clone https://github.com/kands-code/rswk
  2. Navigate to the local repository directory and build the ksl and ksl_matrix subprojects: cd rswk && cargo build -p ksl && cargo build -p ksl_matrix
  3. Install ksl: cargo install --path ksl --debug
  4. Copy the compiled library to the ksl_matrix directory: mkdir .\ksl_matrix\lib && cp .\target\debug\ksl_matrix.dll .\ksl_matrix\lib
  5. Change to the ksl_matrix directory and execute example.ksl: cd .\ksl_matrix && ksl example.ksl

Since this issue only occurs on Windows, the syntax used here is for Windows PowerShell. If you want to try obtaining the expected output on Linux or macOS, you need to replace ksl_matrix.dll with the corresponding libksl_matrix.so or libksl_matrix.dylib.

The expected output should be:

[1, 2, 1]
[2, 1, 0]
[-1, 1, 2]

[-0.6666666666666665, 1, 0.33333333333333326]
[1.3333333333333333, -1, -0.6666666666666666]
[-1, 1, 1]

()

Any insights on why this behavior occurs and how to resolve it would be greatly appreciated!

What happens if you run the code using miri?

This line is going to drop the instance of Matrix<Double> at raw_block, but that memory is initialized with zeroes, which is not valid for Matrix<Double> and hence you get UB.

1 Like

Run cargo miri test for ksl_matrix got this:

Meaning that ptr::write should be used instead of assignment, so the drop does not occur -- correct?

So what should be the right thing to do? The most straightforward approach I can think of to obtain an "arbitrary" Rust object is to use alloc to allocate memory and return the corresponding memory address.

I have tried using ptr::write as a replacement for direct assignment, but it doesn't seem to work in this case.

ptr::write() is the correct way to move a Rust value into an uninitialized place such as a new allocation. If it doesn’t seem to help, most likely you have further problems; you should fix this problem and then continue debugging.

1 Like

That's not running any tests, from that output? MIRI needs to run the code with the problem to report it.

4 Likes

If I use cargo miri run -p ksl -- .\ksl_matrix\example.ksl, I don't get stuck on the logic of ksl_matrix. Instead, I encounter an issue with reading files, specifically the error: "unsupported operation: can't call foreign function SetFilePointerEx on OS windows." I'm not sure how to work around this error, as it seems to be more of a limitation of Miri on Windows rather than an issue with my code logic.

And on Linux, it gets stuck on "unsupported operation: can't call foreign function dlopen on OS linux."

I am unsure how to continue debugging. The only thing I know for certain is that using dbg! to print inv allows the subsequent logic to function correctly. However, if I simply use inv in a statement like let _ = inv.to_string();, it has no effect. This might be because the Rust compiler optimizes that statement away. Nonetheless, I currently lack a clear direction for further debugging.

miri is essentially an interpreter/vm for MIR to detect UB in your program, it has little support for operating system related features, especially file IO, so you should run minimal example containing the problematic code under miri, do NOT run your entire program under miri, it won't work.

the dbg!() just masks the problem by chance, it doesn't "solve" the problem. this kind of unpredictability is a clear indication that you have UB somewhere in your code, you may see unexpected the result at one place, e.g. writing to the pointer from alloc_zeroed(), but the root cause may be anywhere in your program, that's just the nature of unsoundness: it can have non-local effects on the whole program!.

However, if I do not use libloading and simply call and use the returned pointer, there are no issues. Therefore, even the minimal reproducible code needs to include libloading, but Miri does not support this operation, so there is no minimal reproducible code that can be checked by Miri.

miri has pretty good file io support actually, it's just that os shims for windows in general are lacking compared to linux because there are no dedicated windows developers on the miri team, only prs from external contributors

You may already know this. But note that even if you aren't able to reproduce a visible problem when removing libloading, Miri may still find UB problems that just don't happen to be manifesting. So running with Miri is still worthwhile.

What you should do to track down this problem is run the test in a typical debugger such as lldb or windbg. It should catch the issue if it's an invalid access

2 Likes

The minimal reproducible code I can think of is as follows:

use std::{
    alloc::{Layout, alloc_zeroed, dealloc},
    ptr,
};

fn return_raw_ptr() -> *mut u8 {
    let block = unsafe { alloc_zeroed(Layout::new::<Vec<f64>>()) };
    let v = vec![1.0, 2.0];
    unsafe { ptr::write(block as *mut Vec<f64>, v) };
    block
}

fn show_content(ptr: *mut u8) {
    let v = unsafe { &*(ptr as *const Vec<f64>) };
    println!("{v:?}");
}

fn drop_ptr(ptr: *mut u8) { unsafe { dealloc(ptr, Layout::new::<Vec<f64>>()) }; }

fn main() {
    let block = return_raw_ptr();
    show_content(block);
    drop_ptr(block);
}

When setting $env:MIRIFLAGS="-Zmiri-ignore-leaks", the output of Miri is as follows:

> cargo miri run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.15s
     Running `C:\Users\kands\sdk\rustup\toolchains\nightly-x86_64-pc-windows-gnu\bin\cargo-miri.exe runner C:\Users\kands\workspace\rswk\target\miri\x86_64-pc-windows-gnu\debug\ksl_matrix.exe`
[1.0, 2.0]

The key observation on UB (undefined behavior) is, that more often than not UB in part A of a program manifest in part B and gets triggered by part C where all three parts are totally unrelated.

There is a reason, why unsafe is just named as is. It is a very powerful tool, but with great power comes heavy responsibility. Someone writing code using unsafe needs to meet all implicit rules, which are quite complex in the case of Rust. The sad thing about is, that the compiler is not able to assist you and you're on your own.

So lessons learned from uncounted UB bugs in different programming language for decades, Rust developers mostly isolate unsafe code in a separate crate with primitives made as small as possible. Then often there is a crate which provides safe abstractions on those unsafe primitives, typically as small as possible too, and finally there is the actual crate which provides the intended functionally in a idiomatic way using the safe abstractions and pulls in a lot of different dependencies. All these layers are heavily secured by unit tests.

Solution is to de-clutter the code and bring it to a testable condition.

It seems like writing non-trivial unsafe code can still pose some challenges at this stage. For example, consider the (wrong) code:

Could be easily rewritten without using any unsafe:

    let raw_block = Box::into_raw(Box::new(inv));

And I'm pretty sure, there are a lot of occasion of unsafe in the code that are questionable.

Yes, indeed. I repeatedly tried running it with lldb and found that the errors varied each time, but most of them were related to atomic, sync, and rayon_core. This is likely because matrix_ks (which is a dependency of the ksl_matrix library) relies on rayon, and there may still be some issues with dynamic loading on Windows. I'm not entirely sure about the specific cause of the errors, but I can reasonably rule out issues within the ksl_matrix code itself. In other words, these errors are unrelated to the parts of the code I initially suspected, and the specific errors occur during the inverse step.

snips of lldb
PS C:\Users\kands\workspace\rswk\ksl_matrix> lldb ksl -- .\example.ksl
(lldb) target create "ksl"
Current executable set to 'C:\Users\kands\sdk\cargo\bin\ksl.exe' (x86_64).
(lldb) settings set -- target.run-args  ".\\example.ksl"
warning: (x86_64) C:\Users\kands\sdk\cargo\bin\ksl.exe unable to locate separate debug file (dwo, dwp). Debugging will be degraded.
(lldb) r
(lldb) Process 5236 launched: 'C:\Users\kands\sdk\cargo\bin\ksl.exe' (x86_64)
(lldb) (x86_64) C:\Users\kands\workspace\rswk\ksl_matrix\lib\ksl_matrix.dll unable to locate separate debug file (dwo, dwarning: (x86_64) C:\Users\kands\workspace\rswk\ksl_matrix\lib\ksl_matrix.dll unable to locate separate debug file (dwo, dwp). Debugging will be degraded.
Process 5236 exited with status = 0 (0x00000000)
(lldb) r
Process 4836 launched: 'C:\Users\kands\sdk\cargo\bin\ksl.exe' (x86_64)
Process 4836 stopped
* thread #15, stop reason = Exception 0xc0000005 encountered at address 0x7ff9efb01602: User-mode data execution prevention (DEP) violation at location 0x7ff9efb01602
    frame #0: 0x00007ff9efb01602 ksl_matrix.dll`std::sys::sync::condvar::futex::Condvar::wait::he2b8c0b22495dfed at futex.rs:67:9
(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 4836 exited with status = 0 (0x00000000)
(lldb) Process 8416 launched: 'C:\Users\kands\sdk\cargo\bin\ksl.exe' (x86_64)
Process 8416 exited with status = 0 (0x00000000)
(lldb) t
error: Process must be launched.
(lldb) r
Process 7644 launched: 'C:\Users\kands\sdk\cargo\bin\ksl.exe' (x86_64)
Process 7644 exited with status = 0 (0x00000000)
(lldb) r
Process 6892 launched: 'C:\Users\kands\sdk\cargo\bin\ksl.exe' (x86_64)
Process 6892 stopped
* thread #13, stop reason = Exception 0xc0000005 encountered at address 0x7ff9efa2c807: User-mode data execution prevention (DEP) violation at location 0x7ff9efa2c807
    frame #0: 0x00007ff9efa2c807 ksl_matrix.dll`rayon_core::sleep::Sleep::no_work_found::hd753834b7a9023fe(self=0x0000023fd79f03d8, idle_state=0x000000f44edfee50, latch=0x0000023fd79ef400, has_injected_jobs={closure_env#0} @ 0x000000f44edfed70) at mod.rs:101:13
   98           } else if idle_state.rounds == ROUNDS_UNTIL_SLEEPY {
   99               idle_state.jobs_counter = self.announce_sleepy();
   100              idle_state.rounds += 1;
-> 101              thread::yield_now();
   102          } else if idle_state.rounds < ROUNDS_UNTIL_SLEEPING {
   103              idle_state.rounds += 1;
   104              thread::yield_now();
(lldb)