I' sorry I can't give the simply code sample, the project is so big, but the error reappears inside a function.
Sometimes it panics at let old_size = self.inner().strong.fetch_add(1, Relaxed); line, mostly it just abort().
I print some debug messages before cloning the pointer, the code clones the Arc pointer 39 times, but the value of strong makes me confused.
Why does it abort when I clone the Arc pointer?
Any help would be great appreciated.
old_size > MAX_REFCOUNT means one of the cases below happened.
You're rapidly leaking the newly cloned Arc handle. MAX_REFCOUNT is 2^63 on 64bit machines, which means if you leak it every nanoseconds it takes millenniums. There wasn't any Rust program before christ, but well it's possible in theory.
There're some bugs in the Arc implementation which mismodifies the refcount. Software has bug and we should consider it.
Some unrelated unsafe code(like C code) have bug which overwrite the memory location it must not touch. Yes, this is one of the possibilities what happen on UB. You may need to revisit the entire C code and the unsafe {} blocks to find the actual bug.
If you have good machine with dozen of CPU cores and spin them all 100% to run program which does nothing but leaking the single Arc, it would take decades before it aborts. If it aborts within a day, it's not your problem.
If you don't have any unsafe code, your dependencies may have some. Check your dependencies, find ones with unsafe code but not extensively tested and debug them if you can't replace their usage.
Isn't possible on a 64 bit system (very possible on 32bit though)
Is extremely unlikely, could be a compiler bug (but still very unlikely).
Are you using a nightly compiler with features (some of those have known bugs)?
Are you compiling for some uncommon platform (like embedded)?
It's most likely 3.
So it's very likely a bug in unsafe code, or a unsafe block that calls non Rust code (doesn't have to be C). It could exist in what appear to be unrelated places in your code, but are likely called before this. The bug could be in an unsafe block of a dependency you use (The unsafe in the std library typically has a lot of testing and validation, so that's the last unsafe code I would dig into). I would guess that the Arc is getting turned into a pointer somewhere instead of what the arc holds and passed to some external code which corrupts the Arcs counters. Another possibility is a use after free. Both of those should only be possibly within an unsafe block.
Try running your program under valgrind. It may not catch invalid access to the Arc directly, since the counter is still within an allocated program-accessible area, but if something sprays memory badly and hits other addresses too it could be caught.
This error might also be caused by unsafe/FFI code decrementing the count manually or causing a double free with unbalanced Arc::from_raw.