Rust guarantees no segfaults with only safe code but it segfaults (stack overflow)


#1

This is a question I had for some time…

I reported a few months ago a segfault I observed in vagga.
The issue was that vagga swaps the default libc(glibc?) with musl, and musl has a small stack size.
Does Rust use by default something like GCC’s __morestack to grow its stack size when it reaches the limit?
Or this is not an issue particular to musl and this can happen in Rust anyway?


#2

Really what Rust guarantees is memory safety; no undefined behavior caused by writing to or reading from memory in a way that is undefined by the language semantics.

One of the common ways that lack of memory safety manifests is via segfaults; if you have a dangling pointer that points to some unmapped memory, and dereference it, you will get a segfault. That’s actually one of the better failure modes for memory unsafety; a segfault simply kills your program and indicates why, so you can fix it. Much worse are when there is a dangling pointer that points to valid memory, causing silent corruption of program state, sometimes in ways that are exploitable by outside attackers, sometimes in ways that are just very difficult to track down and debug.

Rust used to use __morestack to detect stack overflow and panic in a deterministic way when that happened (and even earlier, used to use __morestack for actual segmented stacks, though that went away a long time ago). However, in 1.4 Rust switched to using guard pages, so accessing memory beyond the stack (stack overflow) causes a deterministic segfault. Getting a segfault on stack overflow is not memory unsafe, it’s a deterministic failure just like killing your program on running out of memory to allocate.

Now, there’s one problem with that; the guard page segfault will only happen if you actually do write to the guard page. It’s possible, however, to have an allocation that goes past the end of the guard page; and now you may be exposed to memory safety issues again if you only write to the portion that is beyond the guard page, and that happens to be mapped memory. To prevent this, you can use stack probes, which are just extra accesses to memory added to a function if its stack frame takes up more than one page, to trigger a segfault in the guard page in a deterministic way, but it looks like those are not yet implemented on all platforms.

So, this is currently a soundness hole in the current implementation of Rust. It is possible to overwrite arbitrary memory if, for instance, you allocate a large buffer on the stack (like the 64k buffer used for std::fs::copy), are near the end of the stack. The stack grows downwards on most platforms, while zeroing out the buffer goes upwards, so you can overwrite some memory that is far beyond the end of the stack before finally getting to the guard page and getting your segfault.

Once that problem is fixed, you will still get a segfault when you overflow the stack and the stack probe hits that guard page, but it should happen before anything untoward has happened, simply killing your process.

edit to add: Since lots of people still see this post, it’s a good place to put an update on the current status.

After a number of exploits were publicized taking advantage of a one-page guard region on stacks (none that I know of affecting Rust code, but there is a possibility that Rust code could be similarly vulnerable), a good deal of attention has been paid to this issue. That led to a renewed push to get the LLVM patches for stack probes merged (1, 2), and to add support for stack probes to rustc. The latter is still in the process of being merged, but should land soon.

Additionally, many operating systems have applied patches which increase the guard region from a single page to 1 MB or more, reducing the chances that a buffer allocated on the stack (like the 64k buffer that had been used for std::fs::copy) would overlap with allocations beyond the guard region.

Anyhow, once PR #42816 is merged and released, Rust should be free of this problem, and should always segfault in a deterministic manner upon stack overflow.


#3

First of all, thanks for the response!

While I agree with that statement. What I am not ok with is that Rust than offers this false assurance/confidence that you won’t hit these problems.
The main page was previously saying something like: prevents nearly all segfaults* which was more accurate than the current claim prevents segfaults, guaranteed memory safety.
With these statements you would think that it grows the stack size when it reaches the limit, as one way of solving the problem.
I understand that it was removed, probably because of performance issues(probably related to threads), but… is there a way to opt-in into the feature?

Again, I feel uneasy that Rust creates this false confidence in the code, especially thinking of people that don’t have experience with native code(particularly issues most often met in C/C++ - that’s why some avoid C and C++ and go to managed languages).

Edit: Btw, in Rustonomicon we have listed a few operations that are harmful but it does not mention stack overflows. Scroll down to Rust considers it "safe" to:


#4

I could be argued that the front page is correct. “Prevents segfaults” is not the same as “prevents all segfaults”; it could mean all, some, or anything more than one. However, it does have a strong implication of “all”, so it could be seen as misleading even though not actually incorrect.

It is hard to explain in a short blurb, however, what “memory safety” really means. I’m not sure what the right phrasing should be. Segfaults are one of the most visible and well known symptoms of memory unsafety, and it does prevent those that are due to memory unsafety. But as discussed here, it is possible to have a segfault in a memory safe manner.

Yeah, stack overflow should probably be added to that list, since it is a fairly common bad thing to happen that people do need to keep in mind.

There’s also a line in the book that states that a segfault is guaranteed to be due to some interaction with unsafe code, but given stack overflows that’s really not quite correct.


#5

Sounds like a lawyer phrase :slightly_smiling:
And I must agree that it sounds a bit ambiguous and only a lawyer and a court can decide what it actually means.
Reminds me of some open-source licenses, like GPLv2. Well, you do say I should release the code and people can modify it, here it is, but you can’t modify the binary :stuck_out_tongue:. Here it is for ambiguity :beers: .

I think the words you were searching for are memory corruption.

I think that it’s ok that it segfaults(and does not corrupt the memory) as long as it is mentioned somewhere in the docs that this can happen.


#6

I’ve filed an issue on rephrasing the book and website to use a better term than “segfault”. I don’t know what phrasing would be better, but I agree that the current phrasing is not really a good fit.


#7

Here is another one: https://github.com/jwilm/racerd/issues/16 that references a stack overflow in racer because of a recursion(which btw is a very easy issue to hit): https://github.com/phildawes/racer/pull/479.
I remember stack overflows even in Rust, in MIR.

Edit: Searched on Google, you can find some, here is one more.


#8

The exact wording of the website is important to me. The current text is definitely implying all seg faults. Our default safe out of stack behavior is to abort. Isn’t that not a segfault?


#9

Maybe something along the lines of “prevents illegal memory access” would be better?


#10

@brson It does abort but because the allocated memory goes out of its supposed range(it reaches the limit of the stack and it allocates further) this by definition is a segmentation fault.


#11

Just jump to Update 2

This could be argued 2 ways:
1: You did access memory you should of not, but it was my memory, the guard, so I did prevent you from accessing an illegal memory location, therefore, no segfault
2: You claim that you prevent segfaults, that would mean that I won’t be able to access an illegal memory location, therefore, I should not be able to go outside my own stack size but I did. I did access an illegal memory location to my application. Me hitting one of the guards doesn’t mean I didn’t access an illegal memory location. This can be achieved even with a C++ compiler flag but they do not claim that the language does not segfault. The conclusion: segfault

I incline towards the second argument. It is a segmentation fault, it was not prevented but the effect of it was amortized by guards. So, Rust is safe of memory corruption but not segfaults.

Update:
To make it more clear. Segfaults are not the dead bodies on the battlefield(the corruption of the memory, therefore the guards protected from it) but the people yelling on the hills, they might shed some blood, or they might not(A segfault is accessing an illegal memory location - guards are also illegal memory locations, though they are smartly placed to help prevent the blood from being shed).
Not sure the analogy helps :slightly_smiling: , especially the second part.

Update 2:
I also understand your point. You think of it as if the guards are an extension of the application, therefore, you are accessing your own memory, just that you have some checks in there that make you crash.

But I incline to think of it as: [shadow memory placed by tools][Application][shadow memory]
The Application is the realm where the developer works in. If he hits the shadow memory, it is not what he intended to do, he didn’t expect that, it is a segmentation fault.

Update 3:
What the developer thinks is irrelevant.
The claims are made about Rust the Application, not Rust the Developer.
It does not claim that it will make the developer write software that runs blazingly fast, prevents segfaults, and guarantees thread safety. The Rust Application will accomplish these…
That it accomplishes these(except the runs blazingly fast) by forcing most of the times the developer to write the correct code through the help of the type system, borrow checker… is another thing.

Now, if we think of the guards as an extension of the application, the same way we would think of the garbage collector as an extension of the application in managed programming languages, then this does prevent segmentation faults, the same way a garbage collector “prevents memory leaks”.

The issue then would be that the panick Rust throws, is a misleading message that should actually say something like "Guard Page Exception - A page of memory that marks the end of a data structure, such as
a stack or an array, has been accessed.
"


#12

But guard pages rely on having an MMU. There are plenty of people writing code on microprocessors without MMUs (ARM R4 comes to mind).


#13

Rust does no such claims about unsafe Rust. Although, you could hit this problem without any unsafe code…
Interesting…


#14

From what I understand, the guard is available only when you use the std.
When you use #![no_std], you can still write safe code that will actually segfault(buffer overflow?), the bad kind of segfault? I don’t know much about programming on microprocessors…

My guess would be that the documentation should be updated, for example, the No stdlib chapter.
Is this expected? This reduces the scope of Safe Rust to applications that use the std(when it comes to stack overflows).
Although it’s not an issue for me personally, it’s interesting what is the official view on this…


#15

A segmentation fault is any access to a region memory that the process is not allowed to access. The guard page that causes the program to terminate upon stack overflow is just such a region in memory. Now, this particular segfault is safe and intended, but it’s still a segfault.

The additional factor that is confusing things here is that https://github.com/rust-lang/rust/issues/16012 is not yet implemented, which means it’s possible to get a segfault by accessing memory beyond the guard page, without having the normal handler invoked that prints an error message indicating a stack overflow. Once that issue is implemented, you will still get a segfault, but with an additional message that indicates why (at least, if you’re running a command line application, in a graphical application it may still be just as confusing).

There is already such a message; if you try running the following, you will get a message about stack overflow (playpen):

fn overflow(n: i64) {
    if n > 0 {
        overflow(n-1)
    }
}

fn main() {
    use std::thread;

    let child = thread::spawn(move || {
        overflow(999999999);
    });
    let _ = child.join();
}

That gives me:

thread '<unknown>' has overflowed its stack
playpen: application terminated abnormally with signal 11 (Segmentation fault)

It’s not quite as detailed as the message that Java prints out, but it does indicate that this is not an unexpected segfault, but rather just a stack overflow. (This example uses a separate thread because it looks like Linux maps memory for the main thread in a way that gives SIGBUS rather than SIGSEGV for the main thread, but it’s really the same behavior just with two different signals that can be sent).

The reason that this message didn’t appear in the Vagga issue is because https://github.com/rust-lang/rust/issues/16012 is not implemented, so overflowing the stack can cause you to segfault by accessing a region beyond the guard page, in which case the normal stack overflow detection doesn’t kick in and you don’t get the nicer error message.

There is some discussion of that in https://github.com/rust-lang/rust/issues/16012 as well; the idea being allowing platforms to do stack-checks per function call rather than relying on guard pages, for those platforms that don’t support guard pages. However, as far as I know no progress has been made on that approach either.


#16

I would say that a SIGSEGV signal used as an implementation detail for performance reasons need not be called a “segfault” in introductory material. The panic is the user-visible behavior, so it’s fine to say that Rust panics on stack overflow, rather than segfaulting.

If we’re discussing how std is implemented, then it makes sense to talk about segmentation faults when explaining how to implement stack overflow checks using guard pages.

But that is a very specific topic. If a C or C++ programmer is visiting the Rust web site to find out what sort of language it is, they will take the term “segmentation fault” to mean one possible response (the best case!) of C and C++ implementations to memory errors. Guaranteeing the absence of those errors is one of Rust’s key selling points.

It doesn’t make sense to back off from a vivid, relatable term just because SIGSEGV signals happen to be used internally to detect stack overflows, with a temporarily buggy implementation that made them visible to the user.


#17

It doesn’t panic upon the segfault, and I don’t know of any plans to change that. After printing out the error message indicating why, it re-enables the system default signal handler, which kills the process with a return code corresponding to the signal and dumps core (if core dumps are enabled on your system). This is a bit more severe than a panic, it can’t be recovered from at all. So other than the extra explanatory message, this looks in all ways like a segfault.

As we see from the original post in this thread, it can be fairly confusing for people to see “prevents segfaults” on the front page and then run into a segfault, and even if you read the book and see this “If a Rust program segfaults, you can be sure the cause is related to something marked unsafe,” that will throw you off the trail if you had a stack overflow.

Whether the extra error message that you see when when we don’t skip past the guard page is enough to alleviate the confusion is an open question. The exit code of the process still indicates SIGSEGV. Bash prints out Segmentation fault: 11 based on this. You’ll see the signal if you’re looking at it in a debugger. Other than a little extra explanatory text, it is still a segfault.

Anyhow, I don’t think it’s that huge an issue, but it could be a little disconcerting running across such a simple example that seems to directly contradict one of the main selling points on the front page. This coupled with the fact that a segfault is really a symptom rather than the problem, I wonder if there might be a better phrasing. “Prevents memory corruption”? “Prevents memory errors”? “Guarantees memory safety”?

So, anyhow, I don’t really have a better suggestion for the phrasing; just something about being so very specific about “segfaults” there feels off to me.


#18

Oh! I was mistaken about the handling of stack overflows. If bash prints “Segmentation fault” and dumps core, that’s a segfault, in the sense that the C or C++ programmer visiting the Rust web site would understand it. Changing the terminology seems necessary, just as you say.


#19

I didn’t know that we exited in this scenario like a normal segfault, and I think that behavior is wrong (when rust used segmented stacks for overflow detection we didn’t segfault). Can std’s handler be made to abort like other fatal errors?

cc @alexcrichton

Edit: iow the segfault is a platform dependent implementation detail of rusts semantics for stack overflow.


#20

I’m at least not personally aware of a time where we ever treated a stack overflow as a panic, but I think this happened in the way way past, right? That is to say, at least when I’ve been working with std I believe that a stack overflow has always translated to some form of an immediate abort of the process.

The biggest roadblock I can think of in turning a stack overflow into a panic is that we don’t know how much stack space the panic handler will take. Not only that, but we’re also currently running on a global stack which has to be preallocated ahead of time, so there’s not a lot of stack space itself to work within. Nowadays we also have custom panic handlers, so you’re perhaps running arbitrary code when a panic happens.

Overall I’ve at least personally felt that stack overflows are so niche and difficult to recover from that it’s basically not worth bending over backwards to support. I feel that we strike a good balance today between handling the error and being pragmatic about it.