My rust project randomly freezes after hours, doesn't start as non root

I've got this really ugly project that I've only cobbled together for myself and added feature after feature over the years. It worked quite well until recently - but now it won't really start as non privileged user anymore (without any clear indication as to what went wrong, only freezing and not accepting inputs anymore..)

and when run as root user it does run for a few hours or days and than does the same it does a few seconds after startup for other users: freeze and don't accept any input any longer..

How would I go about finding the cause of this freezes and do you have suggestions of how to get this project into better shape so I can continue using and extending it over the coming years?

okey, so now I found that on the machine I'm currently targeting and have observed the previously described issues - It does start and run fine again - at least for the few minutes I've watched it while typing this up - when I remove the mqtt_relay feature flag I've not used on this machine for quite some time.

So I'll next try to have a closer look at the code activated by that...

What's the time frame of that "recently"? Did you happend to add/remove/update any features/dependencies as of lately? Any lag/jank/freezing observed beforehand?

So it does start: it just doesn't do what you expect it to. I don't really have the time or the patience to go through the whole codebase to figure out what sort of "freezing" in relation to what kind of "inputs" you mean, though. Can you briefly describe what your code is supposed to do?

Bonus points if you could break it down into a diagram or similar: starting from main(). That might make it a whole lot easier to analyze: for yourself as much as others.

Might be because of a deadlock somewhere down the chain of command. You have a bunch of Arc<RwLock<...>>'s + mpsc::channel's + tokio::spawn's all over the place. It's quite possible somewhere in between a few seconds/hours/days your whole execution simply comes to a halt because of a poorly thought-out synchronization mechanism. One of the reasons why:

Is generally not the ... best approach. Especially if you just keep on piling on new features on top of the older ones without ever stopping to consider if you're even able to understand what in the world is going on with your code. Ideally: at a first glance.

If you've literally spent years bringing it to the point of no longer being able to tell what it does (else you probably wouldn't be here today), you've got to be ready to spend at least a handful of hours (if not days or weeks, worst cast) figuring out what's happening to it. Start with basic isolation:

  • change/remove dependencies
  • turn specific features on/off
  • add some proper logging
  • write a bunch of solid tests

Decouple the specifics of the implementation (Arc<RwLock<...>>) from the overall intent (struct Shared<T> { arc_lock: ... } could be a good start). Compartmentalize the specifics of different features to their own mod if not their own crate within the workspace. See if you can abstract away the lifecycle of your whole project itself into some trait Feature that could be implemented for each of them independently all the while minimizing the cognitive load of their actual use.

// example
pub trait Feature {
    fn init(data: Shared);
    fn send_to(&self, data: Shared);
    fn recv_from(&mut self, data: Shared);
}

Keep your "refactoring" hat on until you've figured a way to "maintain" it well. Don't even think about "using" and "extending" it further until you're done with the bare minimum. Might not be that much fun at first, yet that's pretty much the only way you'll ever write anything not abominably "ugly".

3 Likes

compile with debug information, then attach a debugger when it hangs and capture stack traces.

2 Likes

is there a guide how to do that somewhere that you can give me a link to? Can I do that on a cli = ssh only box without any GUI IDE like vscode?

yes, I consider this a likely culprit. I'm just not sure how to analyze the situation - while trying to minimize the amount of time I have to invest.. This is "just" a hobby project of mine..

But thank you for your many suggestions! I will try to make a diagram of what I believe is supposed to happen when and where in the code - that should make it easier to understand the project and maybe also easier to find approaches to simplify / improve the design and make it more robust against deadlocks.

I agree that tests are nice, I just have no idea how to make the concept work for a project like that - which collects live data from various sources and passes it on & caches / aggregates some of it for later consumption. I will try to find time to read the link you provided about writing tests in Rust.

Yes. You can use gdb with a Rust executable in the same way you'd use it with a C/C++ executable. So all the doc for gdb applies. What I do:

First start the executable name <exe> and get the pid.
Then attach `gdb`:

    $ gdb <exe> <pid> | tee <out>

Where <exe> is the full executable path, <pid> is the running 
program's pid, and <out> is an output file to capture the trace.

You'll get a (gdb) prompt and you can enter the following
command to dump the stack trace to the output file:

    (gdb) thread apply all where

https://www.baeldung.com/linux/program-hang-location#gnu-debugger and
https://stackoverflow.com/a/22980319/1362755

1 Like

well I have very little experience with those - and no idea how I'd debug any of them (outside an IDE that makes it easy)..

Is there some way I can just dump all the call stacks of my rust program while it hangs to a logfile for later analyzation? (there would need to be multiple call stacks, since I have many tokio threads)

I don't want to take the time to interactively look at the process while it hangs, since that would makes me loose data I'd like to collect - therefore I usually restart a different version with different parameters which still works fine within seconds to prevent extensive data loss.

The two replies above this one have instructions for doing just that.

What you @jumpnbrownweasel posted was a guide how to interactively use gdb, not how to quickly dump the info I need in a way I can incooperate into a non interactive script.

The stackoverflow link posted by @the8472 has some suggestions that look like it does what I need, but I don't know what the <coredump> placeholder should be filled with:
gdb <binary> <coredump> -ex "thread apply all bt" -ex "detach" -ex "quit" > output.log

I have no experience with gdb, therefore it's hard for me to understand how it works / how it's used.

UPDATE: so it seems I can fill the <coredump> with the PID of my hung process, just as in the example shown by @jumpnbrownweasel.

Now my current problem is how to correlate the threads output by this to my own code - can I name threads when using tokio::spawn?

UPDATE2: okey, so I can't easily set process names according to this:

1 Like

I posted two links because I didn't quickly find a single location answering your question. The first one shows how to attach gdb to a running process by pid. The second one shows how to get the backtraces for all threads (something you usually need to diagnose deadlocks).

Coredumps are primarily for crashed processs, though you can create one for running processes too with gcore.

1 Like

I don't know how to name threads in Rust and I have rarely needed to do that. Instead I look at the method names, and line numbers within.

I normally dump the stack traces several times, with a short (maybe 1s) pause between dumps. Then I look for threads that are consistently blocked but shouldn't be, or for threads that look like they're looping and they shouldn't be.

It helps a lot to additionally have a log of application activity, including "start X"/"end X" log messages, and timestamps of course. Then you can see what activities have begun but not finished, and those that should have finished given the amount of time that has elapsed. And you can see the incomplete activities at the time of the hang.

By putting these two things together you can normally see where the problem is. However, it can be very tedious and time consuming!

1 Like

I first tried to use what I have (= log lines) to find the cause of this deadlock / freeze-up. Sadly I've ran out of ideas where else to put log statements.. Also I can't make sense of why this is related to my mqtt_relay component, which I just wanted to add to this machines instance of my program, but doesn't do anything yet since there's no mqtt clients sending anything to this machine...

I'll try some more with gdb, but it's output looks very confusing and difficult to interpret for me.

I've stumbled upon GitHub - tokio-rs/console: a debugger for async rust! and want to try to see if I can utilize it for my project & current problem.

Stack traces shouldn't be too confusing, so please post some snippets along with your questions.

32k char sample
GNU gdb 15.0.50.20240403-0ubuntu1
...
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38

Thread 8 (Thread 0x7748ba3ff6c0 (LWP 2933862) "tokio-runtime-w"):
#0 0x00007748ba4ecadf in GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7748ba3fdeb0, rem=0x7748ba3fdeb0) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78
#1 0x00007748ba4f9a27 in GI___nanosleep (req=, rem=) at ../sysdeps/unix/sysv/linux/nanosleep.c:25
#2 0x000060f1a7c28950 in std::sys::pal::unix::thread::thread::sleep () at library/std/src/sys/pal/unix/thread.rs:279
#3 std::thread::sleep () at library/std/src/thread/mod.rs:887
#4 0x000060f1a7991fd1 in rust_influxdb_udp_logger::worx::http_client_loop::{async_fn#0} () at src/worx.rs:427
#5 0x000060f1a794c862 in tokio::runtime::task::core::{impl#6}::poll::{closure#0}<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (ptr=0x60f1c388e630) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/core.rs:331
#6 tokio::loom::std::unsafe_cell::UnsafeCell<tokio::runtime::task::core::Stage<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}>>::with_mut<tokio::runtime::task::core::Stage<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>> (self=0x60f1c388e630, f=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/loom/std/unsafe_cell.rs:16
#7 tokio::runtime::task::core::Core<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>::poll<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (cx=..., self=<error reading variable: Cannot access memory at address 0x0>) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/core.rs:320
#8 tokio::runtime::task::harness::poll_future::{closure#0}<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> () at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/harness.rs:532
#9 core::panic::unwind_safe::{impl#23}::call_once<core::task::poll::Poll<()>, tokio::runtime::task::harness::poll_future::{closure_env#0}<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>> (self=<error reading variable: Cannot access memory at address 0x18>) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272
#10 std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>, core::task::poll::Poll<()>> (data=<error reading variable: Cannot access memory at address 0x0>) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:589
#11 std::panicking::try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:552
#12 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>, core::task::poll::Poll<()>> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:359
#13 tokio::runtime::task::harness::poll_future<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (cx=..., core=<error reading variable: Cannot access memory at address 0x0>) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/harness.rs:520
#14 tokio::runtime::task::harness::Harness<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>::poll_inner<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (self=) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/harness.rs:209
#15 tokio::runtime::task::harness::Harness<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>::poll<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (self=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/harness.rs:154
#16 tokio::runtime::task::raw::poll<rust_influxdb_udp_logger::worx::http_client_loop::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (ptr=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/raw.rs:271
#17 0x000060f1a7c3b9d4 in tokio::runtime::task::raw::RawTask::poll (self=...) at src/runtime/task/raw.rs:201
#18 tokio::runtime::task::LocalNotified<alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>::run<alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (self=...) at src/runtime/task/mod.rs:463
#19 0x000060f1a7c3b9d4 in tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure#0} ()
#20 tokio::task::coop::with_budget<core::result::Result<alloc::boxed::Box<tokio::runtime::scheduler::multi_thread::worker::Core, alloc::alloc::Global>, ()>, tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure_env#0}> (budget=..., f=...) at src/task/coop/mod.rs:167
#21 tokio::task::coop::budget<core::result::Result<alloc::boxed::Box<tokio::runtime::scheduler::multi_thread::worker::Core, alloc::alloc::Global>, ()>, tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure_env#0}> (f=...) at src/task/coop/mod.rs:133
#22 tokio::runtime::scheduler::multi_thread::worker::Context::run_task (self=0x7748ba3fe9b0, task=..., core=) at src/runtime/scheduler/multi_thread/worker.rs:594
#23 0x000060f1a7c41e8b in tokio::runtime::scheduler::multi_thread::worker::Context::run (self=, core=0x60f1c37df320) at src/runtime/scheduler/multi_thread/worker.rs:542
#24 tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0} () at src/runtime/scheduler/multi_thread/worker.rs:507
#25 tokio::runtime::context::scoped::Scopedtokio::runtime::scheduler::Context::set<tokio::runtime::scheduler::Context, tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}, ()> (self=0x7748ba3ff490, t=, f=...) at src/runtime/context/scoped.rs:40
#26 tokio::runtime::context::set_scheduler::{closure#0}<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}> (c=0x7748ba3ff468) at src/runtime/context.rs:176
#27 std::thread::local::LocalKeytokio::runtime::context::Context::try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}>, ()> (f=..., self=) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:311
#28 std::thread::local::LocalKeytokio::runtime::context::Context::with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}>, ()> (f=..., self=) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:275
#29 tokio::runtime::context::set_scheduler<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}> (v=, f=...) at src/runtime/context.rs:176
#30 tokio::runtime::scheduler::multi_thread::worker::run::{closure#0} () at src/runtime/scheduler/multi_thread/worker.rs:502
#31 tokio::runtime::context::runtime::enter_runtime<tokio::runtime::scheduler::multi_thread::worker::run::{closure_env#0}, ()> (allow_block_in_place=true, f=..., handle=<error reading variable: Cannot access memory at address 0x0>) at src/runtime/context/runtime.rs:65
#32 tokio::runtime::scheduler::multi_thread::worker::run (worker=...) at src/runtime/scheduler/multi_thread/worker.rs:494
#33 tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure#0} () at src/runtime/scheduler/multi_thread/worker.rs:460
#34 tokio::runtime::blocking::task::{impl#2}::poll<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}, ()> (self=..., cx=<error reading variable: Cannot access memory at address 0x0>) at src/runtime/blocking/task.rs:42
#35 tokio::runtime::task::core::{impl#6}::poll::{closure#0}<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule> (ptr=0x60f1c37e0938) at src/runtime/task/core.rs:331
#36 tokio::loom::std::unsafe_cell::UnsafeCell<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}>>::with_mut<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule>> (self=0x60f1c37e0938, f=...) at src/loom/std/unsafe_cell.rs:16
#37 tokio::runtime::task::core::Core<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule>::poll<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule> (self=, cx=<error reading variable: Cannot access memory at address 0x10>) at src/runtime/task/core.rs:320
#38 tokio::runtime::task::harness::poll_future::{closure#0}<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule> () at src/runtime/task/harness.rs:532
#39 core::panic::unwind_safe::{impl#23}::call_once<core::task::poll::Poll<()>, tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule>> (self=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272
#40 std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule>>, core::task::poll::Poll<()>> (data=<error reading variable: Cannot access memory at address 0x0>) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:589
#41 std::panicking::try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule>>> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:552
#42 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule>>, core::task::poll::Poll<()>> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:359
#43 tokio::runtime::task::harness::poll_future<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule> (core=, cx=...) at src/runtime/task/harness.rs:520
#44 tokio::runtime::task::harness::Harness<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule>::poll_inner<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule> (self=) at src/runtime/task/harness.rs:209
#45 tokio::runtime::task::harness::Harness<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule>::poll<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule> (self=...) at src/runtime/task/harness.rs:154
#46 tokio::runtime::task::raw::poll<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule> (ptr=...) at src/runtime/task/raw.rs:271
#47 0x000060f1a7c3a652 in tokio::runtime::task::raw::RawTask::poll (self=...) at src/runtime/task/raw.rs:201
#48 tokio::runtime::task::UnownedTasktokio::runtime::blocking::schedule::BlockingSchedule::runtokio::runtime::blocking::schedule::BlockingSchedule (self=...) at src/runtime/task/mod.rs:500
#49 0x000060f1a7c3a297 in tokio::runtime::blocking::pool::Task::run (self=...) at src/runtime/blocking/pool.rs:161
#50 tokio::runtime::blocking::pool::Inner::run (self=0x60f1c37dd590, worker_thread_id=0) at src/runtime/blocking/pool.rs:511
#51 tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure#0} () at src/runtime/blocking/pool.rs:469
#52 std::sys::backtrace::rust_begin_short_backtrace<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/sys/backtrace.rs:152
#53 0x000060f1a7c3d472 in std::thread::{impl#0}::spawn_unchecked
::{closure#1}::{closure#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()> () at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:559
#54 core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked
::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>> (self=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272
#55 std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked
::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>, ()> (data=<error reading variable: Cannot access memory at address 0x0>) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:589
#56 std::panicking::try<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked
::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:552
#57 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked
::{closure#1}::{closure_env#0}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>>, ()> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:359
#58 std::thread::{impl#0}::spawn_unchecked
::{closure#1}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()> () at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/mod.rs:557
#59 core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked
::{closure_env#1}<tokio::runtime::blocking::pool::{impl#6}::spawn_thread::{closure_env#0}, ()>, ()> () at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250
#60 0x000060f1a7c352db in alloc::boxed::{impl#28}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1966
#61 alloc::boxed::{impl#28}::call_once<(), alloc::boxed::Box<dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1966
#62 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/pal/unix/thread.rs:109
#63 0x00007748ba49caa4 in start_thread (arg=) at ./nptl/pthread_create.c:447
#64 0x00007748ba529c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

Thread 7 (Thread 0x7748ba1fe6c0 (LWP 2933863) "tokio-runtime-w"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x000060f1a7c28c17 in std::sys::pal::unix::futex::futex_wait () at library/std/src/sys/pal/unix/futex.rs:72
#2 std::sys::sync::thread_parking::futex::Parker::park () at library/std/src/sys/sync/thread_parking/futex.rs:55
#3 std::thread::thread::park () at library/std/src/thread/mod.rs:1446
#4 std::thread::park () at library/std/src/thread/mod.rs:1083
#5 0x000060f1a7b8add5 in flume::signal::SyncSignal::wait (self=) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flume-0.11.0/src/signal.rs:31
#6 flume::Hook<(), flume::signal::SyncSignal>::wait_recv<()> (self=0x7748ac0011b0, abort=0x60f1c37fc940) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flume-0.11.0/src/lib.rs:345
#7 flume::{impl#26}::recv_sync::{closure#1}<()> (hook=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flume-0.11.0/src/lib.rs:638
#8 flume::Shared<()>::recv<(), flume::signal::SyncSignal, core::result::Result<(), flume::TryRecvTimeoutError>, flume::{impl#26}::recv_sync::{closure_env#0}<()>, flume::{impl#26}::recv_sync::{closure_env#1}<()>> (self=0x60f1c37fc8c0, should_block=, make_signal=..., do_block=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flume-0.11.0/src/lib.rs:601
#9 flume::Shared<()>::recv_sync<()> (self=0x60f1c37fc8c0, block=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flume-0.11.0/src/lib.rs:609
#10 0x000060f1a7b8b784 in flume::Receiver<()>::recv<()> (self=) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/flume-0.11.0/src/lib.rs:884
#11 0x000060f1a7987f60 in rumqttd::link::local::LinkRx::recv (self=0x60f1c37e4780) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/rumqttd-0.19.0/src/link/local.rs:332
#12 rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn#0} () at src/mqtt_relay.rs:38
#13 0x000060f1a794c571 in tokio::runtime::task::core::{impl#6}::poll::{closure#0}<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (ptr=0x60f1c37e4730) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/core.rs:331
#14 tokio::loom::std::unsafe_cell::UnsafeCell<tokio::runtime::task::core::Stage<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}>>::with_mut<tokio::runtime::task::core::Stage<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}>, core::task::poll::Poll<()>, tokio::runtime::task::core::{impl#6}::poll::{closure_env#0}<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>> (self=0x60f1c37e4730, f=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/loom/std/unsafe_cell.rs:16
#15 tokio::runtime::task::core::Core<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>::poll<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (cx=..., self=<error reading variable: Cannot access memory at address 0x0>) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/core.rs:320
#16 tokio::runtime::task::harness::poll_future::{closure#0}<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> () at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/harness.rs:532
#17 core::panic::unwind_safe::{impl#23}::call_once<core::task::poll::Poll<()>, tokio::runtime::task::harness::poll_future::{closure_env#0}<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>> (self=<error reading variable: Cannot access memory at address 0x18>) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/panic/unwind_safe.rs:272
#18 std::panicking::try::do_call<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>, core::task::poll::Poll<()>> (data=<error reading variable: Cannot access memory at address 0x0>) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:589
#19 std::panicking::try<core::task::poll::Poll<()>, core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:552
#20 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<tokio::runtime::task::harness::poll_future::{closure_env#0}<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>>, core::task::poll::Poll<()>> (f=...) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panic.rs:359
#21 tokio::runtime::task::harness::poll_future<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (cx=..., core=<error reading variable: Cannot access memory at address 0x0>) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/harness.rs:520
#22 tokio::runtime::task::harness::Harness<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>::poll_inner<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (self=) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/harness.rs:209
#23 tokio::runtime::task::harness::Harness<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>::poll<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (self=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/harness.rs:154
#24 tokio::runtime::task::raw::poll<rust_influxdb_udp_logger::mqtt_relay::listener::{async_fn_env#0}, alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (ptr=...) at /home/user/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/tokio-1.45.1/src/runtime/task/raw.rs:271
#25 0x000060f1a7c3b9d4 in tokio::runtime::task::raw::RawTask::poll (self=...) at src/runtime/task/raw.rs:201
#26 tokio::runtime::task::LocalNotified<alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>>::run<alloc::sync::Arc<tokio::runtime::scheduler::multi_thread::handle::Handle, alloc::alloc::Global>> (self=...) at src/runtime/task/mod.rs:463
#27 0x000060f1a7c3b9d4 in tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure#0} ()
#28 tokio::task::coop::with_budget<core::result::Result<alloc::boxed::Box<tokio::runtime::scheduler::multi_thread::worker::Core, alloc::alloc::Global>, ()>, tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure_env#0}> (budget=..., f=...) at src/task/coop/mod.rs:167
#29 tokio::task::coop::budget<core::result::Result<alloc::boxed::Box<tokio::runtime::scheduler::multi_thread::worker::Core, alloc::alloc::Global>, ()>, tokio::runtime::scheduler::multi_thread::worker::{impl#1}::run_task::{closure_env#0}> (f=...) at src/task/coop/mod.rs:133
#30 tokio::runtime::scheduler::multi_thread::worker::Context::run_task (self=0x7748ba1fd9b0, task=..., core=) at src/runtime/scheduler/multi_thread/worker.rs:594
#31 0x000060f1a7c41e8b in tokio::runtime::scheduler::multi_thread::worker::Context::run (self=, core=0x60f1c37dfc20) at src/runtime/scheduler/multi_thread/worker.rs:542
#32 tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure#0} () at src/runtime/scheduler/multi_thread/worker.rs:507
#33 tokio::runtime::context::scoped::Scopedtokio::runtime::scheduler::Context::set<tokio::runtime::scheduler::Context, tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}, ()> (self=0x7748ba1fe490, t=, f=...) at src/runtime/context/scoped.rs:40
#34 tokio::runtime::context::set_scheduler::{closure#0}<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}> (c=0x7748ba1fe468) at src/runtime/context.rs:176
#35 std::thread::local::LocalKeytokio::runtime::context::Context::try_with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}>, ()> (f=..., self=) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:311
#36 std::thread::local::LocalKeytokio::runtime::context::Context::with<tokio::runtime::context::Context, tokio::runtime::context::set_scheduler::{closure_env#0}<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}>, ()> (f=..., self=) at /home/user/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/thread/local.rs:275
#37 tokio::runtime::context::set_scheduler<(), tokio::runtime::scheduler::multi_thread::worker::run::{closure#0}::{closure_env#0}> (v=, f=...) at src/runtime/context.rs:176
#38 tokio::runtime::scheduler::multi_thread::worker::run::{closure#0} () at src/runtime/scheduler/multi_thread/worker.rs:502
#39 tokio::runtime::context::runtime::enter_runtime<tokio::runtime::scheduler::multi_thread::worker::run::{closure_env#0}, ()> (allow_block_in_place=true, f=..., handle=<error reading variable: Cannot access memory at address 0x0>) at src/runtime/context/runtime.rs:65
#40 tokio::runtime::scheduler::multi_thread::worker::run (worker=...) at src/runtime/scheduler/multi_thread/worker.rs:494
#41 tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure#0} () at src/runtime/scheduler/multi_thread/worker.rs:460
#42 tokio::runtime::blocking::task::{impl#2}::poll<tokio::runtime::scheduler::multi_thread::worker::{impl#0}::launch::{closure_env#0}, ()> (self=..., _cx=<error reading variable: Cannot access memory at address 0x0>) at src/runtime/blocking/task.rs:42
#43 tokio::runtime::task::core::{impl#6}::poll::{closure#0}<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}, tokio::runtime::blocking::schedule::BlockingSchedule> (ptr=0x60f1c37e0cb8) at src/runtime/task/core.rs:331
#44 tokio::loom::std::unsafe_cell::UnsafeCell<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}>>::with_mut<tokio::runtime::task::core::Stage<tokio::runtime::blocking::task::BlockingTasktokio::runtime::scheduler::multi_thread::worker:{impl#0}::launch::{closure_env#0}>, core::task::poll::Poll<()>,

full log: Send text and files securely and anonymously with end-to-end encryption (no account required) | PrivateBin

to me it looks like somethings wrong betwen the mqtt library and the tokio runtime, but I don't understand why or what...

You're right, it is quite difficult to see what is happening since the only thing we're seeing are tokio and mqtt calls. Unfortunately I have no experience with Tokio, but have heard the tokio console is often useful for debugging:
https://docs.rs/tokio-console/latest/tokio_console/
https://github.com/tokio-rs/console

1 Like

This is tokio_console looking at the same hangup as I've given the gdb log before:

One thing I do find strange ist the not yielding part - I believe all those 4 active tasks are only waiting for input.

For example the last one signal_notifier.rs:155 runs the the process_output function further down in the same file, which spends most of it's time waiting for the iterator to yield lines:

pub async fn process_output(stdout: ChildStdout, last_confirmed: Arc<RwLock<usize>>) {
    let reader = std::io::BufReader::new(stdout);
    for output in std::io::BufRead::lines(reader) {

I don't understand how that task could be shown as active = busy all of the 13 seconds the program ran before I killed it to replace it with a working copy again - and during those 13 seconds I can see in my log output that not a single line has been yielded.

I haven't looked at all of your code but I looked at one of the files mentioned in that console display and I see that you are performing blocking operations in async:

pub async fn process_output(stdout: ChildStdout, last_confirmed: Arc<RwLock<usize>>) {
    let reader = std::io::BufReader::new(stdout);
    for output in std::io::BufRead::lines(reader) {

This will tie up one tokio executor thread and make it unable to make progress on other tasks assigned to it. You should be using tokio's async version of child process management, and similarly swap out any other uses of std::io and std::process.

6 Likes

thank you @kpreid :partying_face:

replacing tokio::process in my signal_notifier module makes it work together with the mqtt_relay module without freezing my application :grinning_face:

I've found that I can similarly make the main.rs:764 = worx handler task not block a thread by using tokio::time::sleep instead of std::thread::sleep :heart:

You don't happen to have any idea how to also fix the blocking behavior of my 2 rumqtt tasks? (broker & listener)

1 Like