Binary won't run - FFI, Alpine, musl and C

Hi all,

I'm trying to debug a build of my project on alpine. Just packaging up a new version and all builds correct. Even abuild is good - testing/sentrypeer: upgrade to 4.0.3 (!81324) · Merge requests · alpine / aports · GitLab

but I think the library I link to written in Rust is causing some issues. All looks good with ldd :

ldd ./sentrypeer 
	/lib/ld-musl-x86_64.so.1 (0x7f7fe5c66000)
	libpcre2-8.so.0 => /usr/lib/libpcre2-8.so.0 (0x7f7fe5bb5000)
	libmicrohttpd.so.12 => /usr/lib/libmicrohttpd.so.12 (0x7f7fe5b92000)
	libcurl.so.4 => /usr/lib/libcurl.so.4 (0x7f7fe5562000)
	libjansson.so.4 => /usr/lib/libjansson.so.4 (0x7f7fe5b84000)
	libuuid.so.1 => /usr/lib/libuuid.so.1 (0x7f7fe5559000)
	libsqlite3.so.0 => /usr/lib/libsqlite3.so.0 (0x7f7fe53d3000)
	libosipparser2.so.15 => /usr/lib/libosipparser2.so.15 (0x7f7fe539d000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x7f7fe5371000)
	libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7f7fe5c66000)
	libgnutls.so.30 => /usr/lib/libgnutls.so.30 (0x7f7fe519a000)
	libcares.so.2 => /usr/lib/libcares.so.2 (0x7f7fe515f000)
	libnghttp2.so.14 => /usr/lib/libnghttp2.so.14 (0x7f7fe513d000)
	libidn2.so.0 => /usr/lib/libidn2.so.0 (0x7f7fe510b000)
	libpsl.so.5 => /usr/lib/libpsl.so.5 (0x7f7fe50f7000)
	libssl.so.3 => /usr/lib/libssl.so.3 (0x7f7fe5034000)
	libcrypto.so.3 => /usr/lib/libcrypto.so.3 (0x7f7fe4a00000)
	libzstd.so.1 => /usr/lib/libzstd.so.1 (0x7f7fe4f83000)
	libbrotlidec.so.1 => /usr/lib/libbrotlidec.so.1 (0x7f7fe4f74000)
	libz.so.1 => /usr/lib/libz.so.1 (0x7f7fe4f59000)
	libp11-kit.so.0 => /usr/lib/libp11-kit.so.0 (0x7f7fe48cf000)
	libunistring.so.5 => /usr/lib/libunistring.so.5 (0x7f7fe46fc000)
	libtasn1.so.6 => /usr/lib/libtasn1.so.6 (0x7f7fe4f47000)
	libhogweed.so.6 => /usr/lib/libhogweed.so.6 (0x7f7fe4eff000)
	libnettle.so.8 => /usr/lib/libnettle.so.8 (0x7f7fe4eb2000)
	libgmp.so.10 => /usr/lib/libgmp.so.10 (0x7f7fe4692000)
	libbrotlicommon.so.1 => /usr/lib/libbrotlicommon.so.1 (0x7f7fe4e8f000)
	libffi.so.8 => /usr/lib/libffi.so.8 (0x7f7fe4e85000)

but upon executing it, nothing happens. gdb shows nothing:

(gdb) run
Starting program: sentrypeer 
[New LWP 21975]
[New LWP 21976]
^C
Thread 1 "sentrypeer" received signal SIGINT, Interrupt.
__cp_end () at src/thread/x86_64/syscall_cp.s:29
warning: 29	src/thread/x86_64/syscall_cp.s: No such file or directory
(gdb) bt
#0  __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1  0x00007ffff7fbecb5 in __syscall_cp_c (nr=35, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, y=<optimized out>, z=0) at src/thread/pthread_cancel.c:33
#2  0x00007ffff7fc1a35 in __clock_nanosleep (clk=0, flags=0, req=0x0, rem=0x0) at src/time/clock_nanosleep.c:33
#3  __clock_nanosleep (clk=clk@entry=0, flags=flags@entry=0, req=req@entry=0x7fffffffea70, rem=rem@entry=0x7fffffffea70) at src/time/clock_nanosleep.c:8
#4  0x00007ffff7fc1df4 in nanosleep (req=req@entry=0x7fffffffea70, rem=rem@entry=0x7fffffffea70) at src/time/nanosleep.c:6
#5  0x00007ffff7fc372c in sleep (seconds=seconds@entry=1) at src/unistd/sleep.c:7
#6  0x000055555558b91a in main (argc=<optimized out>, argv=<optimized out>) at src/sentrypeer.c:140
(gdb) quit
A debugging session is active.

Not sure where to start. If I disable the Rust part, it builds as normal and runs, so definitely something between those two.

Working on every other distro I use (libc, not musl).

What other tools could I use to debug where it's stuck?

Thanks,

Gavin.

Here's my build.rs

without further investigation, since it was not crashing, my first guess was there might be some kind of dead lock in the code. do you use multiple threads? check all the synchronization points, e.g. Mutex::lock().

1 Like

oh, I missed the part that it worked with glibc but not with musl. so it must be some discrepancy between musl and gnu. or maybe it was the interaction between pthread and musl?

btw, can the rust code run independently without the C part? like a unit test or integration test?

also, what's the output if you enable tracing? e.g. use tracing_subscriber::EnvFilter and set env variable RUST_LOG=trace? at least, you should see many internal trace messages from tokio, maybe that would give some clue where it got stuck?

1 Like

Yeah, this is what is puzzling me. My make check works fine which flexes C code and functions I've written in Rust that are consumed by my C via a FFI internal lib:

PASS: tests/unit_tests/runner
============================================================================
Testsuite summary for sentrypeer 4.0.3
============================================================================
# TOTAL: 1
# PASS:  1
# SKIP:  0
# XFAIL: 0
# FAIL:  0
# XPASS: 0
# ERROR: 0
============================================================================

   Compiling os_socketaddr v0.2.5
    Finished `test` profile [unoptimized + debuginfo] target(s) in 5m 47s
     Running unittests src/lib.rs (target/debug/deps/sentrypeer_rust-cd799ab1094bba43)

running 15 tests
test config::tests::test_config_default ... ok
test config::tests::test_config_from_env ... ok
test config::tests::test_load_certs ... ignored, not yet implemented
test config::tests::test_load_file ... ok
test config::tests::test_load_file_and_save ... ok
test config::tests::test_load_file_error ... ok
test json_logger::tests::test_bad_actor_to_json_rs ... ok
test json_logger::tests::test_json_to_bad_actor_rs ... ok
test sip::tests::test_listen ... ok
test tests::it_works ... ok
test tests::test_callback_from_c ... ok
test tests::test_free_string ... ok
test tests::test_return_bad_actor_new ... ok
test tests::test_return_exit_status ... ok
test tests::test_return_string ... ok

test result: ok. 14 passed; 0 failed; 1 ignored; 0 measured; 0 filtered out; finished in 0.01s

Relevant bits (cargo test gets run when you do make check):

and I can cd into sentrypeer_rust and run cargo test to do the opposite of running my Rust lib code that uses parts of my C code and runs pure Rust tests too. Everything important is flexed, except my integration tests hang as they test program arguments.

The process starts by running clap to look for args, maybe there's something in that where it can't inspect things as I know clap doesn't use argv/argc. Just a poor guess there which I don't think is right.

Thanks.

rust std::env::ArgsOs should use the same argc and argv provided by libc. but even if given invalid input, clap should return error, it should definitely not hang the program in any case.

so I don't think command line parsing can be the cause, IMO, threading is more likely the culprit.

btw, I'm not familiar with the code base, what is the entry point function to the rust part that gets called from the C? do you have tracing instrumentation? tracing can be very handy to diagnose problems in highly concurrent code such as network servers.

1 Like

Everything starts here SentryPeer/src/sentrypeer.c at main · SentryPeer/SentryPeer · GitHub

There's a debug mode via -d and verbose via -v which we don't even get to. I'd see everything via gdb at least anyway.

It's something really fundamental and low level. The first Rust parts start here SentryPeer/src/conf.c at main · SentryPeer/SentryPeer · GitHub

ok, I'll give it a shot when I had time, but no promises can be made.

1 Like

Thanks. Was hoping this was something obvious. I as leaning towardsinkijg issues, but that all test suites work. Argh.

diff --git a/src/sentrypeer.c b/src/sentrypeer.c
index 382055b..f0771b8 100644
--- a/src/sentrypeer.c
+++ b/src/sentrypeer.c
@@ -38,6 +38,8 @@ volatile sig_atomic_t cleanup_flag = 0;
 
 int main(int argc, char **argv)
 {
+	fprintf(stderr, "Starting main()\n");
+
 	sentrypeer_config *config = sentrypeer_config_new();
 	assert(config);

Something in the Rust <-> bit:

./sentrypeer 
Starting main()

Time for some print debugging :slight_smile:

Very odd. debug_mode should be true here as -d was passed in AND we after process_cli which is handled by clap now:

reading symbols from sentrypeer...
(gdb) break src/sentrypeer.c:49
Breakpoint 1 at 0x7267c: file src/sentrypeer.c, line 50.
(gdb) run -dras
Starting program: /home/ghenry/sentrypeer-src/sentrypeer -dras
End of procress_cli_rs()

Breakpoint 1, main (argc=2, argv=<optimized out>) at src/sentrypeer.c:50
50		if (config->syslog_mode) {
(gdb) p config
$1 = (sentrypeer_config *) 0x7ffff6a7fc10
(gdb) p config->debug_mode
$2 = false
(gdb) p config->sip_mode
$3 = true

It's going to be something dumb, I know it!

glad you find it, but it's really strange that the problem only occured for apline.

I think I might have got it. to quote a paragraph from the documentation of std::env::args_os():

On glibc Linux systems, arguments are retrieved by placing a function in .init_array. glibc passes argc, argv, and envp to functions in .init_array, as a non-standard extension. This allows std::env::args_os to work even in a cdylib or staticlib, as it does on macOS and Windows.

my understanding of this paragraph is that, std::env::args_os() only work for crate-type="staticlib" if using glibc (or if on Windows or Mac), but not for other linux libc implementations such as musl.

I also find the comment in the source code indicating there's a difference whether the rust standard library is main or not, the comment is talking about duplicated work on gnu libc, but I think it applies to libc in general:

if my theory is true, the fix should be easy: you just need to pass argc and argv to the rust side, and use clap::Parser::parse_from() instead of parse(), which relies on std::env::args().

3 Likes

I was actually dreaming about this last night and the fact I read this ages ago trying to understand why I didn't need to pass argv and argc to the Rust side. I just completely forgot and missed the glibc part!

Great catch and a big thank you for reading everything. I'll have a play tonight.

I'll definitely be adding this to my book too!

Thanks again @nerditation for all your help. This is working now:

C side:

int process_cli(sentrypeer_config *config, int argc, char **argv)
{
	// Check env vars first
	process_env_vars(config);

int process_cli(sentrypeer_config *config, int argc, char **argv)
{
	// Check env vars first
	process_env_vars(config);

#if HAVE_RUST != 0
	process_cli_rs(config, argc, argv);
#else
	int cli_option;

	while ((cli_option = getopt(argc, argv, "hVvf:l:b:c:i:w:jpdrRas")) !=

Rust side:

#[no_mangle]
pub(crate) unsafe extern "C" fn process_cli_rs(
    sentrypeer_c_config: *mut sentrypeer_config,
    argc: usize,
    argv: *mut *mut c_char,
) -> i32 {
    // Convert argc and argv from C to Vec(&str)
    let args_from_c = std::slice::from_raw_parts(argv, argc)
        .iter()
        .map(|&arg| {
            CStr::from_ptr(arg)
                .to_str()
                .expect("Failed to convert CStr to str for command line argument processing.")
        })
        .collect::<Vec<&str>>();

    let args = Args::parse_from(args_from_c);

Will leave this here for others.