I have 2 crates written for work, one containing a parser and the other an interpreter.
Both crates have unit tests, and the interpreter has integration tests as well.
The interpreter also has a binary defined which essentially acts as a CLI-based REPL.
When I run any of the tests on Linux, all is well and it completes fine.
The same tests however, segfault on OS X when I build the crate with cargo build --release.
When I build the binary in release mode and start using the repl, the binary also segfaults.
Here's the odd part: the parser crate uses no unsafe code at all, nor types such as Cell and RefCell.
Now, I've traced the issue to this function:
pub fn read_regex(&mut self, regex: &Regex) -> LexResult<Token> {
self.ensure_has_input()?;
let (m_start, m_end) = match regex.find(self.remaining_source_code()) {
Some(regex_match) if regex_match.start() == 0 =>
(regex_match.start(), regex_match.end()),
_ => return Err(LexErr::ExpectedRegexMatch {
position: self.position()
}),
};
let start = self.position() + m_start;
let end = self.position() + m_end;
self.position = end;
Ok(Token::new(start..end))
}
Specifically, the segfault seems to happen during the regex.find(self.remaining_source_code()) part of the code, and I've verified that self.remaining_source_code() does not segfault. Thus deduction would point to the regex.find() as the origin of the segfaults. However, the regex crate is likely one of the most-used crates in the Rust ecosystem. So while a bug is definitely not impossible, it's also not particularly likely.
There is a possible alternative, which is a bug in rustc.
However, I'm not sure how to find out if it's in the regex crate or rustc at this time, or perhaps a 3rd alternative I haven't even considered.
I would like to point out that your investigation does not yet rule out the possibility of a bug in self.remaining_source_code(). This function may have succeeded in apparence, but returned an invalid string which regex will choke upon.
Segfaults only tell you the point in your program where a certain type of invalid operation (e.g. invalid memory access) occurred. They do not tell you which part of the code made the mistake which led this invalid operation to occur.
EDIT: Also, if you enable debug information with something like RUSTFLAGS="-g", a debugger might be able to tell you where in regex.find you are segfaulting, which could provide additional diagnosis information. However, the output of debuggers may be difficult to interprete in optimized builds. Does the crash also occur in debug builds?
Can you provide the source code of your application or is it closed source ?
If yes, can you compile you application with clang sanitizers? That would definitely help to detect where exactly invalid memory is read.
Edit: Apparently you can only do the sanitizer thing on Linux, but can you try it anyway to see, in case it just happens to run by chance on Linux
I'm attempting to do so now, but I'm running into a wall: lldb is borked, to update it I need tot update XCode (oh joy), which in turn requires an OS update...
And when I tried to install gdb, brew basically told me the same thing for once.
At the very least it'll be a few hours in order to get all that done, as OS X is not particularly speedy when doing major upgrades in my experience.
That's assuming it all goes well of course, and I'm not particularly trusting of Apple's competency regarding OS X stability, especially the last few years.
I'll post an update when I have it all sorted out.
@roflcopter The license is proprietary, but additional context wouldn't help anyway since it's all 100% safe code. Segfaults should be impossible.
Besides that though, I don't see how clang's sanitizers can help with rust code? Aren't those kinds of analyses rather PL-specific?
That reminds me: stack overflows are one common source of segfaults in safe code, as the part of the rust runtime which detects them and translates them into aborts is not yet bullet-proof.
A debugger backtrace would tell you immediately if that is the issue, but if that is not available, another way is to adjust your OS' stack size limit and check if it affects program behaviour (not sure how that is done on OSX).
Beyond that, well, since a segfault is a thin abstraction of a CPU fault, it can mean many other things: invalid instructions (e.g. use of a vector instruction set which is inappropriate for the host machine), memory safety bug in some other code that you are indirectly using...
Good idea! In fact, I think this may have been a critical insight in understanding issue.
I went back to rustc 1.25.0-nightly (616b66dca 2018-02-02) and added the various now-stablish feature attributes back in.
Then I compiled again, and bingo, my REPL works as it should.
I think that from this we can conclude it's a rustc issue.
But I don't have enough information to even submit a bug report. What exactly is going wrong here? Why do newer rustc versions cause segfaults at all on OS X? And why does it manifest only then?
In order to gain more information, I'd like to do something like a bisect install for the range of rustc versions between 2018-02-02 and now. Is anything like that available in rustup?
There is a utility called rust-bisect that automates some of the process.
Honestly, I myself just do it manually and call rustup override set nightly-YYYY-MM-DD && cargo run with different dates. Unless your project takes forever to build, it's not that tedious, and I'm there to catch anything unexpected (like a "failure" for the wrong reason).
Well, that was eerie -- seeing that date (2018-03-15) reminded me of when I tracked down what was, ultimately not a bug in rustc. It was, ultimately, a bug it the winit crate. Perhaps it's also related to the ! type?
The crate in the issue by palango uses dyld, a library that provides a neat way to load a dynamic library. My crate has no need for this feature at the moment.
It also manages to produce a backtrace, something my binary doesn't do when it segfaults.
My crate doesn't use !, at least not directly.
My crate doesn't use any unsafe code directly.
The Cargo.lock of my crate hasn't changed during e.g. the bisection I performed earlier.
FTR, other than point 5 I do not know which of these differences are significant and which are not.
Given the list above, if this issue turns out not to be a rustc bug then the segfault would very likely originate in a (transient) dependency of my crate. And in that case, I would look at the regex crate again (see one of my earlier posts).