Help to find root cause of "illegal hardware instruction"


#1

I met a strange illegal hardware instruction that should be caused by my use of libc (I tested 0.2.36 and 0.2.40), this only happens on the release build.

The core code is as follows:

extern crate libc;
extern crate chrono;

use chrono::prelude::*;

#[cfg(not(all(target_os = "linux", target_arch = "x86_64")))]
#[inline]
pub fn get_cur_time_ms() -> i64 {
    let dt = Local::now();
    dt.timestamp() * 1000 + dt.timestamp_subsec_millis() as i64
}

#[cfg(all(target_os = "linux", target_arch = "x86_64"))]
pub fn get_cur_time_ms() -> i64 {
    use libc::*;
    unsafe {
        let mut tv: timeval = ::std::mem::uninitialized();
        let mut tz: timezone = ::std::mem::uninitialized();
        let _tz = &mut tz as *mut timezone as *mut c_void;
        gettimeofday(&mut tv, _tz);
        return tv.tv_sec * 1000 + tv.tv_usec / 1000;
    }
}

And the profile for release build is:

[profile.release]
codegen-units = 1
debug = false
debug-assertions = false
lto = true
opt-level = 3
panic = "unwind"
rpath = false

The problem is that when invoking on the 2nd version of get_cur_time_ms on one of my projects (as a dependent crate for the final project crate) in release mode, it crashes with 311 illegal hardware instruction. And this only happens recently when I upgrade rustup and crates versions (including libc to 0.2.36). However, when I tried to write a minimal working example by adding a main for the code snippets above, it does not crash.

My questions:

  1. Am I using libc the wrong way to get the current time?
  2. If not, what are the possible reasons of this crash?

BTW, I admit it may not be good idea to specialize Linux x86_64 function in this way since the performance differences may be negligible, however I would like to know the possible root causes of the crash.


#2

How do you know it crashes at the gettimeofday call? Do you have other unsafe code in the vicinity?

Your gettimeofday usage seems fine - you can pass std::ptr::null_mut() for the timezone (since you don’t care about that return value), but that’s not a safety issue.

Do you happen to know what the illegal instruction is? You also mentioned upgrading rustup - I take it you mean rust itself? To that end, what version and channel of rustc are you using?

Another thing worth trying is disabling LTO - does it still crash?


#3

It should be the use of 2nd version of get_cur_time_ms since: when I directly use 1st version there is no crash and the logging message ends before the call of this function although another logging is immediately after this function call.

Unfortunately I don’t know what the instruction is.

Yes, I updated rust nightly from rustup, nightly-2018-05-21-x86_64-unknown-linux-gnu. However it might start working wrongly about ten days ago (I update rust weekly but cannot remember exact version) when I upgraded the OS from Ubuntu 16.04 to Ubuntu 18.04.

Disabling LTO seems not working.


#4

I think this could be caused by use of mem::uninitialized(). LLVM’s optimizer is known to sometimes do funny things with code that uses uninitialized memory, such as assuming that said memory region cannot be accessed and replacing code that accesses it with garbage.

To test this hypothesis, you can replace calls to mem::uninitialized() with calls to mem::zeroed() and see if it fixes your problem.


#5

Note that this error (I assume) derived from having received a SIGILL signal. The most common cause of that is an alignment fault, rather than (as the text implies) trying to execute a bad machine opcode.


#6

Can you look at the generated core file with gdb and see where it says the fault is?

Also, if you can pinpoint which nightly version first starts crashing, it’ll help folks determine the cause (if this is, indeed, a compiler bug).


#7

Ok, I know what the problem is - this reproduces on the playground as well.

The issue is that libc::timezone is an uninhabited type (initially I didn’t even bother checking how this type is defined there):

pub enum timezone {}

Doing let tz: timezone = std::mem::uninitialized() (or zeroed(), doesn’t matter) is immediate UB; the compiler places the ud2 instruction, which is the canonical “trap” instruction for UB.

So the fix is actually what I suggested earlier, albeit I claimed it wasn’t for safety :slight_smile::

let mut tv: timeval = ::std::mem::uninitialized();
gettimeofday(&mut tv, std::ptr::null_mut());

#8

Exactly like what you said!
There is an ud2 instruction after I checked the rust-lldb backtrace! And I finally fixed the issue according to your suggestion.

@vitalyd @HadrienG @dcarosone And thank all of you for the help!


#9

I’ve no idea why libc would place that footgun there - most people, as we did, would initially assume that timezone is the struct timezone that gettimeofday() expects.

Now that you got a bit of experience with rust-lldb, anytime you see the ud2 instruction it means there’s UB and it’s time to go fishing for it :slight_smile:.


#10

Yep :smile: Thanks!


#11

FWIW, I filed https://github.com/rust-lang/rust/issues/50976 to see if this change in codegen was intentional and whether the compiler (or clippy) could catch this.