Need advice for tracking down a deadlock

#1

I think I’ve found a deadlock within either rust or proptest; when I run my code for long periods of time, it eventually hangs on one line in my code: panic!(). I can see within lldb that the panic!() is stuck waiting for a futex to become unblocked. However, when I try to trim the code back so that I can log a bug, the deadlock goes away. That means that anyone that tries to replicate what I’m seeing could have a real headache.

Ideally, there would be some magic command that would allow me to record what happened and report it to the team, even without code. Does anyone know of such a command, or have any other advice?

What I’ve tried

  • Setting RUST_LOG=trace - nothing useful comes of this, I just see that the statement immediately before the panic!() executes, but that’s it.
  • Using helgrind (it never terminates, nor does it tell me where the deadlock is)
  • Removing every lock and piece of multi-thread aware code that I have, just in case I’m causing the deadlock (no change)

Meta

$ uname -a
Linux EMANE 4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release --all
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.2 LTS
Release:	18.04
Codename:	bionic

$ rustc -vV
rustc 1.35.0-nightly (3eb4890df 2019-03-19)
binary: rustc
commit-hash: 3eb4890dfe6db0279fdd3cda19f9643873ae3db9
commit-date: 2019-03-19
host: x86_64-unknown-linux-gnu
release: 1.35.0-nightly
LLVM version: 8.0

$ cargo -vV
cargo 1.35.0-nightly (0e35bd8af 2019-03-13)
release: 1.35.0
commit-hash: 0e35bd8af0ec72d3225c4819b330b94628f0e9d0
commit-date: 2019-03-13
#2

You can try https://rr-project.org/

#3

Just tried it, but I keep getting a fatal error:

rr: Saving execution to trace directory `/home/cfkaran2/.local/share/rr/py_disc_model-740fd5138be0b9d3-2'.
[FATAL /home/roc/rr/rr/src/PerfCounters.cc:263:start_counter() errno: ENOENT] Unable to open performance counter with 'perf_event_open'; are perf events enabled? Try 'perf record'.
=== Start rr backtrace:
rr(_ZN2rr13dump_rr_stackEv+0x28)[0x55faa8]
rr(_ZN2rr15notifying_abortEv+0x47)[0x55fb27]
rr[0x4b9fea]
rr[0x4bad2c]
rr(_ZN2rr12PerfCountersC1Ei+0x35)[0x4bb895]
rr(_ZN2rr4TaskC2ERNS_7SessionEiijNS_13SupportedArchE+0x5c)[0x545fac]
rr(_ZN2rr10RecordTaskC1ERNS_13RecordSessionEijNS_13SupportedArchE+0x27)[0x4fed37]
rr(_ZN2rr13RecordSession8new_taskEiijNS_13SupportedArchE+0x3b)[0x4c285b]
rr(_ZN2rr4Task5spawnERNS_7SessionERKNS_8ScopedFdEPS3_PiRKNS_11TraceStreamERKSsRKSt6vectorISsSaISsEESH_i+0x218)[0x549a28]
rr(_ZN2rr13RecordSessionC2ERKSsRKSt6vectorISsSaISsEES7_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingENS_7BindCPUE+0x202)[0x4cb082]
rr(_ZN2rr13RecordSession6createERKSt6vectorISsSaISsEES5_RKNS_20DisableCPUIDFeaturesENS0_16SyscallBufferingENS_7BindCPUE+0x7df)[0x4cba9f]
rr(_ZN2rr13RecordCommand3runERSt6vectorISsSaISsEE+0x9e3)[0x4c11a3]
rr(main+0x244)[0x452504]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f985a127b97]
rr[0x4529ef]
=== End rr backtrace
Aborted

Trying out perf test yields:

$ perf test
 1: vmlinux symtab matches kallsyms            : FAILED!
 2: Detect openat syscall event                : FAILED!
 3: Detect openat syscall event on all cpus    : FAILED!
 4: Read samples using the mmap interface      : FAILED!
 5: Test data source output                    : Ok
 6: Parse event definition strings             : FAILED!
 7: Simple expression parser                   : Ok
 8: PERF_RECORD_* events & perf_sample fields  : Ok
 9: Parse perf pmu format                      : Ok
10: DSO data read                              : Ok
11: DSO data cache                             : Ok
12: DSO data reopen                            : Ok
13: Roundtrip evsel->name                      : Ok
14: Parse sched tracepoints fields             : FAILED!
15: syscalls:sys_enter_openat event fields     : FAILED!
16: Setup struct perf_event_attr               : FAILED!
17: Match and link multiple hists              : Ok
18: 'import perf' in python                    : Ok
19: Breakpoint overflow signal handler         : Ok
20: Breakpoint overflow sampling               : Ok
21: Number of exit events of a simple workload : Skip
22: Software clock events period values        : Ok
23: Object code reading                        : FAILED!
24: Sample parsing                             : Ok
25: Use a dummy software event to keep tracking: Skip
26: Parse with no sample_id_all bit set        : Ok
27: Filter hist entries                        : Ok
28: Lookup mmap thread                         : Ok
29: Share thread mg                            : Ok
30: Sort output of hist entries                : Ok
31: Cumulate child hist entries                : Ok
32: Track with sched_switch                    : Ok
33: Filter fds with revents mask in a fdarray  : Ok
34: Add fd to a fdarray, making it autogrow    : Ok
35: kmod_path__parse                           : Ok
36: Thread map                                 : Ok
37: LLVM search and compile                    :
37.1: Basic BPF llvm compile                    : Ok
37.2: kbuild searching                          : Ok
37.3: Compile source for BPF prologue generation: Ok
37.4: Compile source for BPF relocation         : Ok
38: Session topology                           : Ok
39: BPF filter                                 :
39.1: Basic BPF filtering                      : Skip
39.2: BPF pinning                              : Skip
39.3: BPF prologue generation                  : Skip
39.4: BPF relocation checker                   : Skip
40: Synthesize thread map                      : Ok
41: Remove thread map                          : Ok
42: Synthesize cpu map                         : Ok
43: Synthesize stat config                     : Ok
44: Synthesize stat                            : Ok
45: Synthesize stat round                      : Ok
46: Synthesize attr update                     : Ok
47: Event times                                : Ok
48: Read backward ring buffer                  : Skip
49: Print cpu map                              : Ok
50: Probe SDT events                           : Skip
51: is_printable_array                         : Ok
52: Print bitmap                               : Ok
53: perf hooks                                 : Ok
54: builtin clang support                      : Skip (not compiled in)
55: unit_number__scnprintf                     : Ok
56: x86 rdpmc                                  : FAILED!
57: Convert perf time to TSC                   : FAILED!
58: DWARF unwind                               : Ok
59: x86 instruction decoder - new instructions : Ok

I’m guessing that the failures mean that rr won’t work, correct?