EXC_BAD_ACCESS on release build only

I'm having trouble debugging a program that only seems to crash in release mode (1.6.0 stable). It crashes on an EXC_BAD_ACCESS code 1 (bad address), and it seems to be something about jemalloc dropping some memory but I don't know enough about the allocations internals to tell. It works perfectly well in debug mode.

The code base itself is relatively small, so check out the repo. It depends on two external crates, which each use unsafe in one place each. astar and rust-typed-arena.

To reproduce, check out the code above and run cargo run --example phrase_morph --release --verbose. It should crash with Process didn't exit successfully: 'target/release/examples/phrase_morph' (signal: 11).

I'm not even sure what to start with, so here's the full thread backtrace from lldb (I'm on OSX 10.11):

* thread #1: tid = 0x59b17d, 0x000000010002e098 phrase_morph`je_tcache_bin_flush_small(tbin=0x000000010141d028, binind=0, rem=100, tcache=0x000000010141d000) + 200 at tcache.c:115, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x000000010002e098 phrase_morph`je_tcache_bin_flush_small(tbin=0x000000010141d028, binind=0, rem=100, tcache=0x000000010141d000) + 200 at tcache.c:115
(lldb) thread back
* thread #1: tid = 0x59b17d, 0x000000010002e098 phrase_morph`je_tcache_bin_flush_small(tbin=0x000000010141d028, binind=0, rem=100, tcache=0x000000010141d000) + 200 at tcache.c:115, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x000000010002e098 phrase_morph`je_tcache_bin_flush_small(tbin=0x000000010141d028, binind=0, rem=100, tcache=0x000000010141d000) + 200 at tcache.c:115
    frame #1: 0x00000001000158cc phrase_morph`je_sdallocx [inlined] je_tcache_dalloc_small(tcache=<unavailable>, ptr=<unavailable>, binind=<unavailable>) + 46 at tcache.h:339
    frame #2: 0x000000010001589e phrase_morph`je_sdallocx [inlined] je_arena_sdalloc(chunk=<unavailable>) + 75 at arena.h:1190
    frame #3: 0x0000000100015853 phrase_morph`je_sdallocx [inlined] je_isdalloct(ptr=0x00000001014242f8) at jemalloc_internal.h:784
    frame #4: 0x0000000100015853 phrase_morph`je_sdallocx [inlined] je_isqalloc(ptr=<unavailable>) at jemalloc_internal.h:813
    frame #5: 0x0000000100015853 phrase_morph`je_sdallocx [inlined] je_tsd_fetch at jemalloc.c:1257
    frame #6: 0x0000000100015853 phrase_morph`je_sdallocx(ptr=0x00000001014242f8, size=<unavailable>, flags=<unavailable>) + 1107 at jemalloc.c:1896
    frame #7: 0x000000010000218b phrase_morph`main::h882e97d8b688f6e9iaa + 6395
    frame #8: 0x000000010000f463 phrase_morph`sys_common::unwind::try::try_fn::h4103587514840227558 + 35
    frame #9: 0x000000010000dab9 phrase_morph`__rust_try + 9
    frame #10: 0x000000010000f30a phrase_morph`rt::lang_start::h216753457f385fdaJCx + 554
    frame #11: 0x00007fff8ed575ad libdyld.dylib`start + 1
    frame #12: 0x00007fff8ed575ad libdyld.dylib`start + 1

If you see anywhere to start, I'll be forever grateful.

Crashes in the memory allocator normally indicate memory allocator misuse, e.g. double frees or overflowing a heap buffer. For dealing with this, you normally want a memory allocation debugger of some kind, but most of them want to interpose the malloc and free symbols, so I needed to prevent rust from using jemalloc. I was able to do that by switching to rust nightly and modiying your code as so:

diff --git a/Cargo.toml b/Cargo.toml
index 191db11..66b41c3 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -6,3 +6,6 @@ authors = ["Andrew Smith <andrewchristophersmith@gmail.com>"]
 [dependencies]
 astar = "0.1"
 edit-distance = "1.0"
+
+[profile.release]
+debug = true
diff --git a/examples/phrase_morph.rs b/examples/phrase_morph.rs
index e398697..032d634 100644
--- a/examples/phrase_morph.rs
+++ b/examples/phrase_morph.rs
@@ -1,3 +1,5 @@
+#![feature(alloc_system)]
+extern crate alloc_system;
 extern crate words;
 extern crate astar;

Also added debug symbols, because they're always useful.

Running it that way it's a bit more useful already (OSX 10.10.5):

phrase_morph(81444,0x7fff7b67c300) malloc: *** error for object 0x7f872ac04e00: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

and if I run it under valgrind, we get a stack trace for the double free itself:

$ valgrind target/release/examples/phrase_morph
==81457== Memcheck, a memory error detector
==81457== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==81457== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==81457== Command: target/release/examples/phrase_morph
==81457==
==81457== Invalid free() / delete / delete[] / realloc()
==81457==    at 0x1000392F7: free (in /usr/local/Cellar/valgrind/3.11.0/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==81457==    by 0x10000507A: collections..vec_deque..VecDeque$LT$collections..string..String$GT$::drop.6483::h072d914a0b6196bf (heap.rs:111)
==81457==    by 0x100002330: main::h76ecc2cb0e026be8jaa (phrase_morph.rs:14)
==81457==    by 0x10000F1D2: sys_common::unwind::try::try_fn::h13555085574443371706 (in target/release/examples/phrase_morph)
==81457==    by 0x10000D818: __rust_try (in target/release/examples/phrase_morph)
==81457==    by 0x10000F075: rt::lang_start::h83533f65c37de1a9Q4x (in target/release/examples/phrase_morph)
==81457==    by 0x10016E5C8: start (in /usr/lib/system/libdyld.dylib)
==81457==  Address 0x10388cdd0 is 0 bytes inside a block of size 3 free'd
==81457==    at 0x1000392F7: free (in /usr/local/Cellar/valgrind/3.11.0/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==81457==    by 0x1000038AB: collections..vec..Vec$LT$collections..vec..Vec$LT$collections..string..String$GT$$GT$::drop.5430::h3c6f44aaf7717b77 (heap.rs:111)
==81457==    by 0x10000231A: main::h76ecc2cb0e026be8jaa (lib.rs:272)
==81457==    by 0x10000F1D2: sys_common::unwind::try::try_fn::h13555085574443371706 (in target/release/examples/phrase_morph)
==81457==    by 0x10000D818: __rust_try (in target/release/examples/phrase_morph)
==81457==    by 0x10000F075: rt::lang_start::h83533f65c37de1a9Q4x (in target/release/examples/phrase_morph)
==81457==    by 0x10016E5C8: start (in /usr/lib/system/libdyld.dylib)
==81457==  Block was alloc'd at
==81457==    at 0x100038EBB: malloc (in /usr/local/Cellar/valgrind/3.11.0/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==81457==    by 0x100006B30: fmt::Write::write_fmt::Adapter$LT$$u27$a$C$$u20$T$GT$.Write::write_str::h17911570873508343231 (heap.rs:57)
==81457==    by 0x1000147F2: fmt::write::hb978accedf955189bPW (in target/release/examples/phrase_morph)
==81457==    by 0x1000072ED: string::T.ToString::to_string::h2808114158712291688 (mod.rs:129)
==81457==    by 0x10000725D: WordSearch$LT$$u27$a$GT$.SearchProblem::start::h378a0dbf234e3afdFha (lib.rs:132)
==81457==    by 0x10000109E: main::h76ecc2cb0e026be8jaa (lib.rs:279)
==81457==    by 0x10000F1D2: sys_common::unwind::try::try_fn::h13555085574443371706 (in target/release/examples/phrase_morph)
==81457==    by 0x10000D818: __rust_try (in target/release/examples/phrase_morph)
==81457==    by 0x10000F075: rt::lang_start::h83533f65c37de1a9Q4x (in target/release/examples/phrase_morph)
==81457==    by 0x10016E5C8: start (in /usr/lib/system/libdyld.dylib)
==81457==
==81457==
==81457== HEAP SUMMARY:
==81457==     in use at exit: 35,030 bytes in 426 blocks
==81457==   total heap usage: 473,699 allocs, 473,277 frees, 17,520,601 bytes allocated
==81457==
==81457== LEAK SUMMARY:
==81457==    definitely lost: 0 bytes in 0 blocks
==81457==    indirectly lost: 0 bytes in 0 blocks
==81457==      possibly lost: 0 bytes in 0 blocks
==81457==    still reachable: 128 bytes in 1 blocks
==81457==         suppressed: 34,902 bytes in 425 blocks
==81457== Rerun with --leak-check=full to see details of leaked memory
==81457==
==81457== For counts of detected and suppressed errors, rerun with: -v
==81457== ERROR SUMMARY: 4 errors from 1 contexts (suppressed: 0 from 0)

After forcing the system malloc, we also get the crash in a debug build, with a much better stack trace. lldb:

  * frame #0: 0x00007fff91c4208e libsystem_malloc.dylib`malloc_error_break
    frame #1: 0x00007fff91c33159 libsystem_malloc.dylib`free + 314
    frame #2: 0x00000001000020b8 phrase_morph`phrase_morph::heap::deallocate(ptr="", old_size=140734799803056, align=1) + 72 at heap.rs:111
    frame #3: 0x0000000100002040 phrase_morph`phrase_morph::raw_vec::RawVec<T>.Drop::drop(self=0x0000000102016ef0) + 176 at raw_vec.rs:567
    frame #4: 0x0000000100001f4e phrase_morph`alloc..raw_vec..RawVec$LT$u8$GT$::drop.4734::h4cc2a389747d163e + 14
    frame #5: 0x0000000100001f36 phrase_morph`collections..vec..Vec$LT$u8$GT$::drop.4731::h0e7931bf1e121016 + 6
    frame #6: 0x000000010000191a phrase_morph`collections..vec..Vec$LT$u8$GT$::drop.4714::h0e7931bf1e121016 + 26
    frame #7: 0x00000001000018f6 phrase_morph`collections..string..String::drop.4711::hd405353a65344567 + 6
    frame #8: 0x00000001000013ab phrase_morph`phrase_morph::vec::Vec<T>.Drop::drop(self=0x0000000102017ec0) + 203 at vec.rs:1434
    frame #9: 0x000000010000129e phrase_morph`collections..vec..Vec$LT$collections..string..String$GT$::drop.4137::h855aacd19e7ca5a1 + 14
    frame #10: 0x000000010000583b phrase_morph`phrase_morph::vec::Vec<T>.Drop::drop(self=0x00007fff5fbff620) + 203 at vec.rs:1434
    frame #11: 0x000000010000572e phrase_morph`collections..vec..Vec$LT$collections..vec..Vec$LT$collections..string..String$GT$$GT$::drop.5447::h3c6f44aaf7717b77 + 14
    frame #12: 0x0000000100005717 phrase_morph`typed_arena..ChunkList$LT$collections..string..String$GT$::drop.5444::h53c526fb72920c0f + 23
    frame #13: 0x0000000100007d56 phrase_morph`core..cell..UnsafeCell$LT$typed_arena..ChunkList$LT$collections..string..String$GT$$GT$::drop.5588::h7ba207dfd96b211a + 6
    frame #14: 0x0000000100007d4a phrase_morph`core..cell..RefCell$LT$typed_arena..ChunkList$LT$collections..string..String$GT$$GT$::drop.5585::h818462f7098ab527 + 10
    frame #15: 0x0000000100007d36 phrase_morph`typed_arena..Arena$LT$collections..string..String$GT$::drop.5582::h7c3ae6546b86bedc + 6
    frame #16: 0x0000000100005389 phrase_morph`phrase_morph::astar<words::WordSearch>(s=0x00007fff5fbff6d8) + 3513 at lib.rs:272
    frame #17: 0x0000000100001015 phrase_morph`phrase_morph::main + 405 at phrase_morph.rs:14
    frame #18: 0x0000000100030d93 phrase_morph`sys_common::unwind::try::try_fn::h13555085574443371706 + 35
    frame #19: 0x000000010002f3d9 phrase_morph`__rust_try + 9
    frame #20: 0x0000000100030c36 phrase_morph`rt::lang_start::h83533f65c37de1a9Q4x + 566
    frame #21: 0x00000001000190ca phrase_morph`main + 42
    frame #22: 0x00007fff8eba85c9 libdyld.dylib`start + 1
    frame #23: 0x00007fff8eba85c9 libdyld.dylib`start + 1

valgrind:

==82026== Invalid free() / delete / delete[] / realloc()
==82026==    at 0x1000AC2F7: free (in /usr/local/Cellar/valgrind/3.11.0/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==82026==    by 0x1000020B7: heap::deallocate::haeb3cd69f6908a804ea (heap.rs:111)
==82026==    by 0x10000203F: raw_vec::RawVec$LT$T$GT$.Drop::drop::h9181694532946307831 (raw_vec.rs:567)
==82026==    by 0x100001F4D: alloc..raw_vec..RawVec$LT$u8$GT$::drop.4734::h4cc2a389747d163e (in target/debug/examples/phrase_morph)
==82026==    by 0x100001F35: collections..vec..Vec$LT$u8$GT$::drop.4731::h0e7931bf1e121016 (in target/debug/examples/phrase_morph)
==82026==    by 0x100001919: collections..vec..Vec$LT$u8$GT$::drop.4714::h0e7931bf1e121016 (in target/debug/examples/phrase_morph)
==82026==    by 0x1000018F5: collections..string..String::drop.4711::hd405353a65344567 (in target/debug/examples/phrase_morph)
==82026==    by 0x1000013AA: vec::Vec$LT$T$GT$.Drop::drop::h8041253880683918705 (vec.rs:1434)
==82026==    by 0x10000129D: collections..vec..Vec$LT$collections..string..String$GT$::drop.4137::h855aacd19e7ca5a1 (in target/debug/examples/phrase_morph)
==82026==    by 0x10000583A: vec::Vec$LT$T$GT$.Drop::drop::h2483060143510938303 (vec.rs:1434)
==82026==    by 0x10000572D: collections..vec..Vec$LT$collections..vec..Vec$LT$collections..string..String$GT$$GT$::drop.5447::h3c6f44aaf7717b77 (in target/debug/examples/phrase_morph)
==82026==    by 0x100005716: typed_arena..ChunkList$LT$collections..string..String$GT$::drop.5444::h53c526fb72920c0f (in target/debug/examples/phrase_morph)
==82026==  Address 0x1048a5588 is on thread 1's stack

I'm rather suspicious of the unsafe block in astar. The mem::uninitialized winds up with an inferred type of String in your example, so it's deliberately creating a dangling string pointer, whereas the crash is coming from an attempt to drop a dangling string pointer. I do not think this is a coincidence.

2 Likes

Thanks. That's a huge help. I agree; the stack trace that I'm getting now with valgrind seems to indicate that something is happening with the String that is returned from the start method. I changed the entire unsafe block of astar to the following:

deque.push_front((*node.state).clone());

...and also had to add Clone to the trait requirements of S::Node. Is this significantly worse? It seems that it only has as many clones as the path is long, which is minimal compared to the prolific clones in the rest of the methods.

That seems superficially reasonable; I didn't look closely at what the rest of the astar code is doing, so there might be a better way. The question then is does it fix the crash?

1 Like

Yes, it works as expected now. I think I'll dig more deeply into the astar code later, but this isn't so performance-dependent so I'll stick with my local fork + clone at the moment.