An interesting article on undefined behavior

The article is about C and C++, but the introduction gives a nice explanation of why many attempts to avoid undefined behavior fail.

4 Likes

Thanks for the heads up. That is a wonderful article.

I think many of us here already knew much of that already and that is why we are here. But this is something I had not really thought about much before:

An ominous passage in the C++17 standard portends a looking-glass world where cause follows consequence: "[If an] execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation) " [emphasis added].

Given that, and everything else talked about in that article, I come to a horrifying conclusion: There are no C or C++ programs are is guaranteed to work. None.

Why?

Because it is almost certain that all non-trivial C/C++ programs have at least one instance of UB somewhere. Even the most teeny, tiny bit of UB hidden away in a some dusty old function that is almost never called and only triggered under the most obscure conditions if it is called gives the compilers writers carte blanche to do what they like with all the rest of the code.

After all, no matter how big your program is it is only a scaled up version of the examples given in the article where the reasons for the compiler eliding all your code are somewhat easier to see and easier to agree with.

4 Likes

I don't think your characterization is correct.
Only if the execution path contains UB is the compiler free to do anything.
So if there is some UB in some obscure part of a codebase, as long as it isn't called that's fine and the compiler has to be entirely semantics preserving.

5 Likes

But I said: "almost never called and only triggered under the most obscure conditions".

Which is to say it is on the call graph somewhere. If the compiler cannot prove to itself that it is never executed then the compiler has permission to run amok with the rest of your code.

2 Likes

Still incorrect. The compiler has permission to run amok if undefined behavior is actually executed. If it's not executed, there is no undefined behavior, even if the compiler cannot prove it.

If it cannot tell whether the bad bit is unreachable or not, then it must entertain the possibility that it really is unreachable and optimize accordingly.

7 Likes

I have to disagree. The compiler cannot know if the undefined behaviour is ever actually executed or not (Except in trivial cases). That will depend on some combination of inputs at run time that the compiler cannot be aware of.

The compiler is free to run amok at compile time and make mice meat out your code. At least that is what I gather from the paragraph in the C++17 standard in question, and as demonstrated to some extent by the examples in the article.

I start to think this misunderstanding must be responsible for a lot of bugs in C and C++ code.

In theory, yes, the compiler is free to do anything.

In practice, undefined behavior is generally used for only one thing: optimizing the program, assuming that it is unreachable. And, yes, this optimization can easily go wrong - e.g., if compiler is able to prove that every path in program contains UB, the program can be compiled down to single trap instruction. But in general, if UB is triggered by some inputs, then any other code path must assume that these inputs are impossible - nothing more, nothing less.

2 Likes

It doesn't matter whether the compiler knows or not. If there's no UB, then there's no UB.

Consider a program that does the following:

  1. prints "hi"
  2. Computes complicated expression resulting in a boolean x
  3. If x == true, print "bye" and exit the program
  4. Trigger undefined behavior

If we compile this program and run it with an input that ultimately leads to the boolean taking the value true, then the program is required to print "hi", then "bye" and exit. Anything else would be incorrect according to the specification, and it doesn't matter whether or not the compiler can figure out when the boolean is true and false.

Of course, if we ran the program with an input that would ultimately lead to the boolean being false, then the compiler can do anything, including not printing "hi".

17 Likes

If it doesn't know, it has to compile the code in such a way that when the semantics of the code that is actually executed turns out to be well defined, it is executed correctly. So it can't "run amok" and do whatever it wants in such a case.

I could well be misinterpreting the C++17 standard. But when it says:

"[If an] execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation) "

I take that as permission to do whatever, with any and all of the program, if the compiler detects some undefined operation in the source.

I take it that the program is NOT required to print "hi" then "bye" in your example.

Consider a different example:

  1. Calculate something complicated, say X.
  2. Print X.
  3. Calculate something complicated, say Y.
  4. Print Y
    ...
  5. Calculate something else complicated, say U.
  6. Print U. But this action contains UB.
  7. Exit.

Conceivably the compiler could decide to optimize things in such a way that wants to do all the calculations, X, Y... first and then print out the results.

This is allowed by the standard as the observable behaviour of the program is as the programmer intended.

But that UB is something a programmer should not write. So according to that paragraph in the standard and as the article demonstrates the compiler is free to do what it likes with all that "preceding" code.

If we take the example where x is true, then the condition "[If an] execution contains an undefined operation" is not satisfied, so the part about operations that precede the first undefined operation does not apply in that case.

2 Likes

In this case, since UB is always executed, compiler is free to optimize the whole program away and not only don't print, but also don't calculate anything, that's right. But is this "always executed" case the one you're worried of?

I don't follow that reasoning at all.

If one writes something in C or C++ that performs an operation whose behaviour is undefined in the standard then surely it is the opposite of "well defined". Surely there is no way to execute it correctly. That is why the call it "Undefined Behaviour".

A simple example would be using an uninitialised variable. There is no "correct" execution of that.

So it can "run amok" and do whatever it wants in such a case. As the quoted standard paragraph says.

Here, consider another program:

fn main() {
    let arr = [0, 1, 2];
    println!("{}", get_unchecked(&arr, 0));
}

fn get_unchecked(arr: &[i32], idx: usize) -> i32 {
    let ptr = arr.as_ptr().add(idx);
    *ptr
}

As I understand the interpretation you are suggesting, the following line of thought would be ok:

  1. Okay, let's try to optimize get_unchecked.
  2. I can deduce that UB is triggered if idx >= arr.len().
  3. I do not know whether idx is greater than the length of arr or not.
  4. Thus this program might have UB.
  5. Thus I can compile this program to reformat your disk.
4 Likes

Not sure. Sounds bad enough already not to have to worry about "possibly executed UB".

Let's see...

In the case that the compiler wants to rearrange things so as to calculate everything first and then print everything, if there is a UB in that printing, such as use of a rogue pointer, then all those results waiting in memory to be printed can be clobbered on hitting the UB causing errors and or crashes. That UB may be conditional but when it happens you are toast w.r.t everything.

So we have a situation where the source as you read it indicates, calculate, print, calculate, print, ... but what you can get is garbage output from the get go.

Because the compiler has rearranged the temporal order of operations to not be the order they appear in the source text.

This of course is already possible with modern optimisers. Although we don't expect to see it across whole programs as my example indicates.

I interpreted that standards paragraph as pointing out that the compiler is free to do this kind of thing. On that kind of scale.

That is perhaps a slightly different argument to my statement about having carte-blanch to generate what they like on seeing UB in the code. But doesn't it amount top the same thing in the end? Either the UB is hit at execution time on rare occasions and your entire programs output is toast at that point. Or the optimiser throws away all that code at compile time and the output is always toast. I guess the latter would be better!

Ah, is there some confusion here? I have been talking about UB in C and C++. That example is Rust.

Does the Rust spec say the same or different about UB? Is there a Rust spec?

It's the same in Rust, but feel free to imagine that it was C.

Yes. The crucial part is "when it happens", i.e. "when the input is one that triggers UB". If the input is such that the program doesn't touch UB-inducing code, then the program must execute correctly, so compiler can't change everything - it must leave the "happy path" intact.

OK.

I guess I am going to buy your argument, compiler writers are not free to do what they like with your code on the suspicion that UB may be executed in some circumstance.

But I still have a nagging suspicion about that C++ standards paragraph.