LD_PRELOAD not generating traceback on panic/error

sarvi · September 19, 2020, 6:56am

New to rust.
I am working on an LD_PRELOAD program that propagates itselves by intercepting exec family of calls and to tracks filesystem acceses by intercepting libc open calls.
When running it on a very large build, I am running into a few instances of

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/tracker.
rs:186:113
stack backtrace:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/tracker.
rs:186:113

I am propagating RUST_BACKTRACE=1 as well, along with LD_PRELOAD and LD_LIBRARY
_PATH when intercepting exec.
If dont propagate RUST_BACKTRACE=1, I would see additional message, suggesting I use RUST_BACKTRACE=1 to see traceback. In this case I am seeing no such messages. Just this one line.

Does RUST offer anything else that I could use debug this further? to understand how we got here.

The specific line of error comes from

(&self.file).write_all(format!("{} {} {}{}\n", self.uuid, op, contin, &value[ind..max]).as_bytes()).unwrap();

Which is what writes tracing informaiton about filesystem access to a trace file specified in WISK_TRACKFILE=/a/b/c.file

I know this, because I did run into this same issue a day ealier. I was opening this file in initialization code that happens in the libwisktrack.so library constructor function.
posix_spawn, forks then execs, and it looks like the exec closes all file descriptors on exec, and I suspect it was closing the open file descriptor of the library during the exec. Once I removed the initialization code from the constructor, that got resolved.

But I am seeing this in a different location.

During the previous instance it produced a traceback that helped me understand the context.
Love this capability of RUST.

Except in the current cases, I dont get the traceback.
And am trying to understand why I am seeing the error but no actual traceback and what other tools RUST might have to help me here.

Michael-F-Bryan · September 19, 2020, 8:43am

I'd first double-check the code that panics is actually getting RUST_BACKTRACE=1. The easiest way to do that would be adding something like assert_eq!(std::env::var("RUST_BACKTRACE").unwrap(), "1") on the line before the bad write. That way if the variable isn't being propagated for whatever reason, it'll crash on that line and you'll be able to troubleshoot further.

It's odd to get a "Bad file descriptor" error when writing to a std::fs::File. Usually the File object will make sure a file descriptor isn't closed until it gets dropped... Is your LD_PRELOAD-ed function doing things with file descriptors directly? Potentially trying to use a file descriptor that came from a different process?

If you just care about the backtrace for debugging purposes, you could use the backtrace crate to capture a new backtrace, then print the Backtrace to stderr.

Is there any chance something is consuming stdout or stderr? It could be that backtraces are being printed, but they're being redirected to another file/process.

You should also be able to tell GDB to break on your particular function (maybe something like this). I don't often need to reach for a debugger when writing Rust code, but it can come in handy in these sorts of situations.

As an aside, it's probably more concise to use writeln!(self.file, "{} {} {}{}", self.uuid, op, contin, &value[ind..max]) than going via write_all(format!(...).bytes()). You skip an allocation and it's more concise.

You can also try to handle the possible write error with a match statement instead of using unwrap() to blow up at runtime (relevant section from The Book).

sarvi · September 21, 2020, 8:19am

In corporated the assert you suuggested. Found that some of the scripts were clearing the environment and hence RUST_BACKTRACE=1 as well.
I then coded to always env::set_var("RUST_BACKTRACE","1");
After that the backtrace is always printed.

On the topic of "Bad file descriptor" though, I working with file descriptors to translate fd to path

fn fd2path (fd : c_int ) -> PathBuf {
    let f = unsafe { File::from_raw_fd(fd) };
    let fp = f.path().unwrap();
    f.into_raw_fd();
    fp
}

This is the only raw file descriptor handling. And this API was onnly used to map fd to path for logging the paths associated with intercepted API like open(), or symlinkat(), etc.

I went ahead and disabled the above code as well. Just to test things.

and I still ran into the traceback of Bad file descriptor as follows

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 9, kind: Other, message: "Bad file descriptor" }', src/tracker.rs:223:113
stack backtrace:
   0: rust_begin_unwind
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/panicking.rs:475
   1: core::panicking::panic_fmt
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/core/src/panicking.rs:85
   2: core::option::expect_none_failed
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/core/src/option.rs:1219
   3: core::result::Result<T,E>::unwrap
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/core/src/result.rs:1005
   4: wisktrack::tracker::Tracker::report
             at /ws/sarvi-sjc/wisktrack/src/tracker.rs:223
   5: wisktrack::tracker::Tracker::reportexecv
             at /ws/sarvi-sjc/wisktrack/src/tracker.rs:341
   6: wisktrack::my_execv::{{closure}}
             at /ws/sarvi-sjc/wisktrack/src/lib.rs:157
   7: std::thread::local::LocalKey<T>::try_with
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/thread/local.rs:267
   8: std::thread::local::LocalKey<T>::with
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/thread/local.rs:243
   9: wisktrack::my_execv
             at /users/sarvi/.cargo/git/checkouts/redhook-16e0a55232cdd48a/293568e/src/ld_preload.rs:87
  10: wisktrack::execv::{{closure}}
             at /users/sarvi/.cargo/git/checkouts/redhook-16e0a55232cdd48a/293568e/src/ld_preload.rs:104
  11: std::panicking::try::do_call
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/panicking.rs:373
  12: __rust_try
  13: std::panicking::try
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/panicking.rs:337
  14: std::panic::catch_unwind
             at /rustc/7e6d6e5f535321c2223f044caba16f97b825009c/library/std/src/panic.rs:394
  15: execv
             at /users/sarvi/.cargo/git/checkouts/redhook-16e0a55232cdd48a/293568e/src/ld_preload.rs:104
  16: <unknown>
  17: <unknown>
  18: <unknown>
  19: <unknown>
  20: <unknown>
  21: <unknown>
  22: <unknown>
  23: <unknown>
  24: <unknown>
  25: <unknown>
  26: <unknown>
  27: <unknown>
  28: <unknown>
  29: <unknown>
  30: <unknown>
  31: <unknown>
  32: <unknown>
  33: <unknown>
  34: <unknown>
  35: <unknown>
  36: <unknown>
  37: <unknown>
  38: <unknown>
  39: <unknown>
  40: <unknown>
  41: <unknown>
  42: <unknown>
  43: <unknown>
  44: <unknown>
  45: <unknown>
  46: <unknown>
 47: <unknown>
  48: <unknown>
  49: <unknown>
  50: <unknown>
  51: <unknown>
  52: <unknown>
  53: __libc_start_main
  54: <unknown>
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

So if rust indeed does take care of managing the file descripter associaited with a file object, I am not sure where the the "file" object is being deleted. Considering the file object is part of a global lazy_static definition.

pub struct Tracker {
    pub wsroot: String,
    pub cwd: String,
    pub filename: String,
    pub file: File,
    pub uuid: String,
    pub puuid: String,
    pub pid: String,
    pub env : HashMap<String, String>,
    pub wiskfields : Vec<(String, String)>,
    pub cmdline: Vec<String>,
}

thread_local! {
    #[allow(nonstandard_style)]
    pub static MY_DISPATCH_initialized: ::core::cell::Cell<bool> = false.into();
}

lazy_static! {
    pub static ref TRACKER : Tracker = Tracker::init();
}

thread_local! {
    pub static MY_DISPATCH: (bool, Dispatch, WorkerGuard) = {
        // debug(format_args!("Trace Initialize\n"));
        let ret = make_dispatch("WISK_TRACE");
        if ret.0 {
            with_default(&ret.1, || {
                lazy_static::initialize(&TRACKER);
            });
        } else {
            lazy_static::initialize(&TRACKER);
        }
        MY_DISPATCH_initialized.with(|it| it.set(true));
        // debug(format_args!("Trace Initialize: Complete\n"));
        ret
    };
}

pub fn make_dispatch(tracevar: &str) -> (bool, Dispatch, WorkerGuard) {
    let file_appender;
    let tracing;
    if let Ok(tracefile) =  env::var(tracevar) {
        file_appender = tracing_appender::rolling::never("", tracefile);
        tracing = true
    } else {
        file_appender = tracing_appender::rolling::never("", "/dev/null");
        tracing = false
    }
    let (non_blocking, guard) = tracing_appender::non_blocking(file_appender);
    let subscriber = FmtSubscriber::builder()
        .with_max_level(Level::TRACE)
        .with_writer(non_blocking)
        .finish();
    (tracing, Dispatch::new(subscriber), guard)
}

The one problem I see with the above code is while 2 of the above is thread local, TRACKER is lazey_static. And I suspect its initialization might need protection from initialization from multiple threads.

That said, its intialization is currenntly being done in the LD_PRELOAD library constructor function as follows

#[ctor]
fn cfoo() {
    // debug(format_args!("Constructor: {}\n", std::process::id()));
    // real!(readlink);
    redhook::initialize();
    MY_DISPATCH.with(|(tracing, my_dispatch, _guard)| { });
    // debug(format_args!("Constructor Complete\n"));
}

I would think the no threads have started at this point. Is it possible for a process to start with multiple threads before the constructor of an LD_PRELOAD library ?

Update: May be I am coding my intent wrong in rust. My goal is to have a common global initialization for the follow 2 items

tokio-tracing - for tracing/debugging issues. That trace all operations to a common log file, across all processes execed/propagated from the top level make program that is being tracked.
tracking file - Where I write data associated with each libc API I intercept in my LD_PRELOAD library. Again a common file across processes execed/propagated from the top level make program that is being tracked

In a multi threaded program that is exec'ed, Then the tokio tracing code would probably open the same log file in append mode multiple times, I would venture to guess.

while the Tracker object would get initialized once globably and the "file" object it contains and the fd would be used by multiple threads within that process that might be invoking intercepted libc API that will be writing tracking data to the same tracking file/file-descriptor from different threads.

Not sure if this is not acceptable in RUST to be writing to the same file multiple times. Or if there is a bug in the lazy_static crate that destroys TRACKER object. If that is the case, I would have expected the tracker object to be deleted/gone if the file/file descriptor within it is deleted and bad.

So a bit lost.

Is there problem with opening the same file multiple times from different threads? as is happenning tokio tracing to that is going to a single file across multiple threads and multiple processes
Is there a problem with writing tracking data to a single tracking file opened globally as a single "file" object with a single file descriptor, and potentially being written to from multiple threads within that process?

sarvi · September 26, 2020, 6:20am

Problem solved.
Root caused to some programs using dup2 use specific file descriptors fd=3 in this case for some reason.
This was essentially overwriting the automatically picked/opened fd when opening tracking file.

Also these programs were essentially then closing these duped hard coded fd making them bad file descriptors as part of the fork and exec cleanup.

Solution was for my LD_PRELOAD program to open and dup2 the fs to some FD high up in the 800s

system · December 25, 2020, 6:20am

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Is "Bad File Descriptor" a possible error in RUST when all you are doing is opening and writing to a file? help	3	2025	December 25, 2020
RUST_BACKTRACE=1 did not work help	5	8483	March 26, 2020
Is there any logging library can log stack traceback when panicked or assertion failed? help	5	466	September 28, 2023
Debug 'thread panicked while panicking'? help	7	4981	January 12, 2023
How to interpret backtrace generated by a panic? help	5	628	March 5, 2023

LD_PRELOAD not generating traceback on panic/error

Related Topics