C++ pitfalls hard to avoid that are elegantly managed in Rust

Hi,

A friend of mine asked me if I can show some examples of C++ coding pitfall that are hard (using nasty or overcomplicated code) to avoid in a proper/elegant way in C++ that are natively and elegantly checked and avoided by Rust compiler.

For example returning a reference to a local variable could be avoided just using smart pointer, so that would not be a not so nice example, I mean it could be considered as a barely acceptable example :blush:.

I was not able to find a nice and comprehensive post on this forum neither on the web, so I was wondering if somebody could help me.

Thank you.

4 Likes

How about:

x = a[i];

Oops, out of bounds.

x = x + 1;

Oops, overflow.

p = q;

Oops, some silent coercion lost data.

That's before we start to talk threads....

7 Likes

A thing that happened to me in C++ is the following:

std::vector<X> v = ... // Some vector with elements
const X& head = v.front();
// lots of code
v.emplace_back(...);
// lots of code
head.method(); // Explodes sometimes

This is a pitfall that's relatively hard to avoid. Basically anything in C++ can be moved away from under your feet. I see two inelegant solutions for this particular example:

  1. Make a copy of the first element (if type X supports that)
  2. Or "drop" the head reference before calling emplace_back and get a fresh reference after the call. This would require to put the first head reference and all code that comes after up to the emplace_back call into parantheses:
std::vector<X> v = ... // Some vector with elements
{
    const X& head = v.front();
    // lots of code
}
v.emplace_back(...);
// lots of code
const X& head = v.front();
head.method();

And of course you have no guarantee that the next code edit doesn't break it again.

In contrast, this is super elegant in Rust:

let mut v = vec![1,2,3];
let head = v.first().unwrap();
// lots of code
v.push(4);
// lots of code (any use of the old "head" here wouldn't compile)
let head = v.first().unwrap();

Two Rust features make this nice: The compile-time guarantee that the (possibly invalid) head can not be used after the call to push and the ability to shadow variables such that we don't need to create a new scope to get a fresh head.

6 Likes

fn close(self) consumes self, so that it's not possible to compile faulty code:

db.close();
db.query(); // use of moved value error
13 Likes

I think many examples fall flat if you think of returning a shared pointer (or by value) as acceptable. But many C++ programmers don't think the performance penalty is acceptable. :wink:

My prime examples would be:

  • Anything involving std::move
  • Inserting / Removing from a container while you're iterating over it
  • Anything involving multithreading
  • Nullability (How do optional references work in std-C++...)
3 Likes

The biggest pitfall I have with C++ is understanding it. And I have used it for years.

It's such a huge and complex language that I have seen even Bjarne stumbling to understand why a few lines of code in a presentation slide did not work as expected. It was at that point I realized it's not me being slow, C++ is really impossible to understand.

I do pray that Rust does not succumb to that creeping feature bloat and complexity.

7 Likes

Really a 35 year old computer language cannot be expected to be easy to read. I find it surprising it has taken this long for an alternative to show up...

Could you clarify what you mean there.

Is it that 35 year old languages were and still are hard to read. I think not. Languages like ALGOL, Coral, PL/M, C are pretty straight forward. As I recall so was C++ when I had Bjarne's first C++ book. I rather slim and manageable volume.

Or is it that if a language survives for 35 years it will have accumulated so much cruft and bolt on features that nobody can fathom it all anymore. As in C++.

We see this death by complexity going on in other languages, like Javascript accreting "class". JS and Python accreting "async". And so on.

1 Like

I don't think I can clarify... it is more from a sense of how the world has changed. A computer 35ya was a very different thing than we have today, and the story of programming languages is one of increasing legibility and abstraction away from coding in binary...

You list many languages as easy to read, but where are they today? The only one I know of (consider I was a toddler in '85) is C, and frankly I don't find it that easy to read.

I think at best I mean: if Rust is still around, no matter how bloated, in 30years it should be considered successful but likely in need of a successor. C and C++ cannot last forever.

2 Likes

Here's something I wrote on the subject in another thread:

4 Likes

Non-copyable structs.

In C/C++ structs are copyable. That's super dangerous for structs that own heap memory (Vec/vector), because if you'd allow naive copying of the struct data, you'd get double-free or use-after-free.

C works around that by working only with pointers to heap-allocated structs, which adds overhead.

Making things non-copyable in C++ is a bit complex. But usually C++ solves the problem the other way: by adding copy constructors that implicitly perform deep copies or refcounting of the data. That leads to paranoia about hidden inefficiencies caused by passing values around, and needs knowing rules of copy elision.

In Rust you just don't implement Copy. That's it.

8 Likes

To be fair in modern C++ this particular pitfall can be avoided by deleting the copy ctor and only leaving the move ctor available. You can then define an explicit clone method if needs be.

It is true that generating a deep copy ctor by default is a bad default, though.

Other than that, I'm not sure that I agree with the idea that returning reference to locals can be prevented by the use of smart pointers in c++. Using smart pointers changes the ownership semantics of the code and is not always desirable. A code that overuses shared ptr in particular exposes itself to cycles, and adds overhead compared to the reference solution.

To answer the question by taking a step back, I will summon the example of this post of mine, where a single character typo caused undefined behaviour in my C++ code. Rust gives me the guarantee (and the serenity if I may say so) that a typo will never trigger ub in safe rust. This tranquility alone is worth a lot, because maybe it is possible to write memory safe c++, but you have to be constantly on your guard, gauging everything you (and your colleagues) write, detecting every bad pattern, knowing and remembering all bad c++ defaults, for a single mistake can open the ub gate. I prefer using this time I spent chasing wild pointers in c++ to think about my rust code design and look for logic errors (of which many more can be caught statically in rust thanks to the more expressive type system).

7 Likes
int* bad()
{
    int x = 5;
    return &x;
}

It may seem obvious to someone experienced with C or C++ that this distilled example is wrong, but as lifetimes/ownership get more complex, they get harder to keep track of in one's head. I've seen similar things in the wild...

3 Likes

Thanks for all your replies.

I've noticed that nobody has posted yet something related to exceptions vs panic and stack unwinding, I think that is another big topic where C++ is quite weak and where Rust is a way more robust. What do you think?

This may not be entirely accurate, but I believe when C++ was conceived, the programming language community had more respect for self-programmable meta-language than I think they do today. So one of C++'s original design parameters was "user code should act like language code," and it was decided that objects should have language-supported constructors, so that you could do things like implement user-defined literals. But constructors can call arbitrary code, so what if a constructor throws an exception? Now you need syntax for catching constructor exceptions. But what if you are constructing an array? Now you need special syntax for trying to initializing an array with objects whose constructors may throw exceptions.

Or you could have let primitives be special.

In C++, references are always non-null, (except when they are null,) because they're always created from already-constructed objects. So what if you want to conditionally construct an object and make a reference to it? The if conditional is a statement, so you can't construct a reference from a value it returns, so what you have to do is put that conditional inside the constructor. But to do that, you must be able to dispatch a function with the same name on the basis of its argument types, which means C++ inevitably needs overloading.

Or you could have used an if expression.

So much of C++ would be better if the language didn't try to claim dominion over constructing objects. (But that probably wouldn't have happened if everyone in the 80's didn't suffer the fever dream of thinking that LISP-like macro metaprogramming was the inevitable future of programming.) Incidentally, this is why I'm wary of anybody whose idea of "the future of programming" involves writing Domain Specific Languages. If you want to write in a language designed to have programmable syntax, then you're creating a situation in which there are few common concepts to communicate with.

9 Likes

Unwinding can cause serious problems in Rust too, and is a persistent source of bugs in Rust libraries in my experience. However, Rust does mitigate this problem in a few ways:

  1. Result and Option and ? are the preferred method of dealing with recoverable errors, so panics in typical Rust code are rarer than exceptions in typical C++ code.
  2. Rust discourages catching of panics in most cases, so it's less common that a program will continue after unwinding.
  3. The worst panic-safety bugs, like the ones in the link above, are only possible in unsafe Rust, so they affect less code and are easier to audit for. (The bugs above were found because people audited the unsafe code, not because they caused errors in production.)
  4. Rust doesn't have overridable copy constructors or move constructors, so simple operations like assignment are guaranteed not to panic. Values can be moved/copied without running arbitrary code. This helps library authors limit the places where they must guard against panics in user code.
9 Likes

Dependencies and build system issues.

Try maintaining a C++ cross-platform code base depending on, say, boost and you'll see...

CMake, ninja and Conan help, but even with these tools (and they are excellent), it's still full of pitfalls.

If you want specifics I can probably give you a few examples, but I'd rather forget all about it.

Cargo FTW!

1 Like

I am really impressed by the Google Sanitizers. These tools are basically like a compiler-built-in variation of what Valgrind does. Using Rust doesn't obliviate the need for these tools, but I could never trust any substantial project written in C or C++ without them.

The very existence of these tools is proof that writing safe code in C or C++ is extraordinarily difficult. A code base shared by a team of developers increases the difficulty exponentially for each team member added. In my experience, this is because understanding the safety implications of wild void * pointers, bounds checking, avoiding double-frees, and correctly synchronizing shared access with mutual exclusion locks requires keeping the entire program state in your head at all times.

It isn't always possible to review code written by other team members, and even when you do, it isn't always possible to keep the state of the existing code in your head, while also bringing in the new state from the new code in a cohesive way, without introducing new errors. In other words, the compiler only does what you tell it to do, and there are many many more unsafe things you can tell it than there are safe things. Thus Valgrind and the Sanitizers were born.

What about Rust? If you use any unsafe, these tools will help check (at runtime) what the Rust compiler cannot (at compile-time). So these tools are still a good fit. However, if you are only using the safe subset of Rust, it's very unlikely that these tools will tell you anything you don't already know. E.g. that the code is memory safe, thread safe, free of undefined behavior and data races.

BTW, the leak sanitizer, Valgrind's memcheck tool (for leak detection), and leak detectors in allocators like jemalloc are all still very relevant to the safe subset of Rust. The language doesn't provide protection against memory leaks: Memory Leaks are Memory Safe


The incredible ease by which C and C++ can violate memory safety has bitten me more times than I can count. Often with a rude 3 AM wake up call with segfaults in production code that customers depend upon for their businesses. This same ease of shooting oneself in the foot is the sole reason I hated preemptive threading for many years, to the point where I avoided it entirely and put all of my trust in process boundaries without shared memory. Not to mention countless weeks spent debugging code, saying things like "I have no idea why this doesn't work," only to find that the bug actually exists in some completely unrelated code path.

If you can write flawless C or C++ code (and that's a big if! In fact, I don't trust anyone with the hubris who believes they are so flawless) then maybe these languages are just fine for you. But only if you're the only person that ever touches the code. Alternatively if you already have an excellent CI story with a comprehensive test suite, including automation around Valgrind or Google Sanitizers, you're probably pretty well off at avoiding C++ pitfalls. You just happen to find out about them during potentially long CI test runs, instead of compile time in your development workspace.

I don't mean this as a harsh critique of C++. But it's rightfully deserved, so I don't feel too bad. :slight_smile:

6 Likes