LD_PRELOAD with .init_array: fatal runtime error: thread::set_current should only be called once per thread

Hi all,

I am trying to set up a dylib project that is intended to work as a LD_PRELOAD shared object. While the unit tests of what I'm implementing generally work fine, I'm having issues when running a simple integration test (the ones in ./tests directory) of this form:

fn preload_lib_path() -> PathBuf {
    let path = [
        env!("CARGO_MANIFEST_DIR"),
        format!("target/debug/lib{}.so", env!("CARGO_PKG_NAME")).as_str(),
    ]
    .into_iter()
    .collect::<PathBuf>();
    // File must exist for the integration tests that rely on it to succeed! Just in case...
    assert!(path.is_file(), "Preload .so does not exist at {path:?}. Please compile the project first and then run these tests!");
    path
}

#[test]
fn preload_works_via_logs() {
    // We create a random command with `assert_cmd` and pass LD_PRELOAD as an env var to it
    let command = Command::new("ls")
        .arg("-l")
        .env("LD_PRELOAD", preload_lib_path())
        .output()
        .expect("failed to execute process");

    // Running a innocuous command expected to work without issues
    command
        .assert()
        .success();
    // ... there are some example logs that I print, so I test that they are emitted here
}

I'm trying to set up a minimal reproducer but I haven't managed yet (I'll share as soon as I do), but the basic structure of the project has an entrypoint like this one:

#[cfg_attr(target_os = "linux", link_section = ".init_array")]
pub static LD_PRELOAD_INITIALIZE: extern "C" fn() = self::ld_preload_initialize;

extern "C" fn ld_preload_initialize() { /* logic here */ }

And performs some operations like:

  1. Setting up a logger with env_logger.
  2. Read files and parse them with serde.
  3. Manipulate a non-trivial collection (HashMap) inside a Mutex.

Although by the logs that are emitted indicate that the ld_preload_initialize reaches the end, the actual command executed in the test (ls) doesn't get to run, and instead I get this error:

fatal runtime error: thread::set_current should only be called once per thread
error: test failed, to rerun pass `--lib`

Caused by:
  process didn't exit successfully: `/<PROJECT_PATH>/target/debug/deps/<PROJECT_NAME>-5f5c39d631f618e1` (signal: 6, SIGABRT: process abort signal)

I don't spawn any threads explicitly in all the logic (but I assume env_logger might), nor call set_current.

I have seen that if I compile the project with the static conditionally compiled to be absent in test builds the integration tests actually succeed:

#[cfg(not(test))]
#[cfg_attr(target_os = "linux", link_section = ".init_array")]
pub static LD_PRELOAD_INITIALIZE: extern "C" fn() = self::ld_preload_initialize;

extern "C" fn ld_preload_initialize() { ... }

So I suspect some kind of this PRELOAD .init_array section is being initialised more than once when running the "test binary"... but I don't know if testing like this is even advisable or if it should be working at all anyways. The code I'm writing tries its best to not have any panics (handling all Result::Err, Option::None and so on) and doesn't include any explicit unsafe code.

As I said above, I'm trying to set up a minimal reproducer but I haven't been able to do so. As soon as I have it I'll either add more data here or even post the solution if I find it.

It's my first time exploring this, so there's quite a bit that I don't know yet. I wonder if there's anything in what I report that points to a possible root cause? Any recommendations when developing programs like this that leverage .init_array and are intended to run as LD_PRELOAD shared objects? Let me know if there's more information I can provide to help figuring out the issue, I'll try to share the details I can.

Thanks a lot!

So, trying to set up a minimal reproducer actually helped because what I saw is that the behavior did change between Rust 1.80.1 and 1.81.0.

You can see it in action in this repo. Contains instructions to reproduce quickly with Docker (I tried this on a Mac). The actual code is a couple of instructions, with no tests whatsoever:

Reproduce quickly with docker (aarch64)

$ rm -rf target ; docker run --rm --volume $(pwd):/app rust:1.80.1 bash -c "cd app ; cargo test"
   Compiling preload-test v0.1.0 (/app)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.46s
     Running unittests src/lib.rs (target/debug/deps/preload_test-ff929676b3aef50e)
HOLA!
{"key3": "value3", "key1": "value1", "key2": "value2"}
Bye!

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
$ rm -rf target ; docker run --rm --volume $(pwd):/app rust:1.81.0 bash -c "cd app ; cargo test"
   Compiling preload-test v0.1.0 (/app)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.49s
     Running unittests src/lib.rs (target/debug/deps/preload_test-0a654d8cd2033440)
HOLA!
{"key2": "value2", "key3": "value3", "key1": "value1"}
Bye!
fatal runtime error: thread::set_current should only be called once per thread
error: test failed, to rerun pass `--lib`

Caused by:
  process didn't exit successfully: `/app/target/debug/deps/preload_test-0a654d8cd2033440` (signal: 6, SIGABRT: process abort signal)

Is this some kind of regression? Where can I report this in case it hasn't been reported yet?

Rust standard library is not supported in functions you place in .init_array.

1 Like

Given the test runner ended with SIGABRT I thought this was related to Abort on uncaught panics in extern "C" functions added in 1.81.0, but adding a catch_unwind that wraps all the work done inside the .init_array-linked function, even though I was normally ensuring no explicit sources of panics were present, does not solve the issue.

Not sure what would be the issue here.

Ooh I see, then I guess it working before was like just luck, I'll try modifying the reproducer to not use the std and see if it helps.

From what was commented here I was under the impression that it could possibly work without issues. Sad to see it doesn't... is this documented anywhere?

EDIT: corrected wrong link, sorry!

In the crate-level docs for std:

Use before and after main()

Many parts of the standard library are expected to work before and after main(); but this is not guaranteed or ensured by tests. It is recommended that you write your own tests and run them on each platform you wish to support. This means that use of std before/after main, especially of features that interact with the OS or global state, is exempted from stability and portability guarantees and instead only provided on a best-effort basis. Nevertheless bug reports are appreciated.

On the other hand core and alloc are most likely to work in such environments with the caveat that any hookable behavior such as panics, oom handling or allocators will also depend on the compatibility of the hooks.

I think 1.81 included a change to how thread IDs are assigned, thus resulting in some library call making the main thread get initialized as if it were a thread started by FFI before the Rust runtime initialization gets a chance to mark it as the main thread. Or that change just enabled set_current to notice that this was already occurring.

This isn't a panic, it's a ā€œfatal runtime errorā€ (std did a rtabort!). An abort due to extern "C" will have the usual ā€œthread panickedā€ message along with noting ā€œpanic in a function that cannot unwind.ā€ (Assuming everything works as intended, anyway; unwinding can fail for other reasons as well outside Rust's control.)

3 Likes

Thanks! For the time being replacing the logging with calls to the libc-print macros has made it run when built with 1.81.0, but I'll be on the look for any additional issues.

Would be great to see that thread ID assignment change...

Also thanks for the clarifications regarding panic/abort. When I saw the release notes for 1.81.0, though it didn't make full sense in my head, I somehow considered any non-usage of catch_unwind on extern "C" contexts could potentially cause the code to be aborted. Glad to see I wasn't completely lost there.

I suppose something in the std IO machinery is using ReentrantLock, if this is actually the correct cause? Although that's purely just a guess.

Yes, StdoutLock uses a ReentrantLock internally, so println!() tries to call current_id() which somehow causes the abort. I don't understand however how CURRENT could have been set to a value without also setting CURRENT_ID. Afaict, setting one of those two always sets the other as well (which is an explicit invariant).

Oh I see it now. The call chain is println!() -> Stdout::lock() -> ReentrantLock::lock() -> thread::current_id() -> thread::current() -> thread::try_current() which initializes CURRENT and CURRENT_ID, calling set_current(Thread::new_unnamed()), all before the main function is called. Then, when the runtime starts up, it calls rt::init() which tries to set_current(Thread::new_main()) which then leads to the abort. This didn't happen in 1.80.1 because ReentrantLock::lock only used a TLS-address, so it didn't need CURRENT to be initialized. I'm still not sure why setting LD_PRELOAD changes this behavior though.

Without the preload nothing gets run before main. (The difference isn't using LD_PRELOAD, it's OP using .init_array to initialize their dylib when loaded via LD_PRELOAD.)

1 Like

But it only crashes without the LD_PRELOAD? According to `cargo test` for an `LD_PRELOAD` `dylib` causes runtime error in 1.81.0 Ā· Issue #130210 Ā· rust-lang/rust Ā· GitHub setting LD_PRELOAD prevents the abort.

So, if I got this right, the change implies that calling println! initializes the main thread, so when the time of actually reaching main occurs the main thread initialization is attempted again, so it fails and the whole thing aborts?

Then, this would be impacting at times where I both define a function in .init_array and an entrypoint (I assume in the same executable binary), such as running the test, because a test runner executable will contain both the entrypoint and the function linked to .init_array. When compiling the actual library and just using it as LD_PRELOAD this wouldnā€™t happen?

Like, the runtime loaded for a separate library as LD_PRELOAD would not be the same that for the binary I would execute. Did I get this right?

My worry is that thereā€™s some details I might be missing in the way Iā€™m using this dylib that could cause this runtime error in the actual usage of the library. Extensive testing notwithstanding.

That's not quite what was said ā€” using LD_PRELOAD to load the dylib (presumably into an executable with non-Rust entry point, or at least a non-conflicting Rust runtime entry point) was working as expected, but tests were failing unexpectedly (due to the Rust entry point) despite the functionality working fine otherwise.

Probably. With a cdylib crate type, the dylib output is supposed to be entirely self-contained. However, I can't be certain that symbols won't ever get unified; the thing you'll want to test is preloading on top of a Rust binary compiled with the exact same toolchain.

If the executable entry point isn't a Rust binary crate, then rt::init() won't ever happen and an .init_array call is essentially indistinguishable from a call at the start of the foreign language main.

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.