Smart-pointers: C++ vs Rust

c.m · December 11, 2024, 3:08pm

... unless optimizer collapses some loop into counter += N. Improbable, but not impossible.

afetisov · December 11, 2024, 3:52pm

SkiFire13 · December 12, 2024, 12:15pm

Another advantage that Rust's smart pointers have is that they follow the Rust ABI, which allows them to be passed like other normal pointers. C++'s smart pointers are notably not zero cost due to being considered like structs and always passed on the stack instead of registers.

trinhtuan · December 12, 2024, 12:30pm

Do you have any example showing that?

SkiFire13 · December 12, 2024, 1:10pm

Sure, see for example Compiler Explorer. In the Rust code passing Box to an opaque function is a no-op, but passing a unique_ptr in C++ will involve a whole bunch of operations. Among other things (e.g. the code for the destructor due to baz potentially not destroying the unique_ptr) you can also notice the write and subsequent read from the addres of rsp in order to pass the unique_ptr to bar (and also the read of the value from the address at rdi, which was where foo's caller put the unique_ptr).

There is also a kinda famous talk that goes in depth on this problem if you're interested https://www.youtube.com/watch?v=rHIkrotSwcc&t=1049s

steffahn · December 15, 2024, 10:18am

This is fascinating; I believe I’ve even seen that video before, but only this time I realized the important detail about C++ here that I had never known before:
When you have by-value arguments to functions, the destruction is handled by the caller!

I was already aware that C++’s version of “move” operations works via through move constructors, which don’t fully get rid of the original, moved-from object, but instead are just … let’s say … encouraged to rob the original object of all of its resources (especially ownership of memory).

Of course, Rust can’t really model normal “constructors” accurately at all, because there is always a move in Rust, never in-place construction; but ignoring this detail, C++ move seems somewhat comparable to mem::take; the original object stays in place, but is left in some kind of cheap dummy state. (Of course mem::take is different in that it’s generally a very clearly defined state; whereas a moved-out state in C++ is often only promising some “unspecified but valid state” be left behind.)

This of course already has some down-sides; e.g. all objects that you want to move need some additional sort of null-like state. Anyways… with that knowledge, I’ve always kind-of assumed that

fn pass_on_the_box(x: Box<Foo>) {
    other_function(x);
}

when "translated" to C++ would become more like the (moral) equivalent of

fn pass_on_the_box(mut x: Option<Box<Foo>>) {
    other_function(mem::take(&mut x));
}

Now, our things became nullable; and x will also still be dropped at the end of the function (then containing a None value).

Really, I should have thought about this longer – the destructor argument makes little sense! Without knowing anything about other_function; as long as the type (Option<Box<Foo>>) is known, the compiler can optimize this code; and after inlining the “move constructor” (mem::take) and the destructor (drop glue of Option<Box<Foo>>), it should easily be able to spot that x will be None after mem::take, and dropping this a no-op. And also Rust has similar behavior already, anyways, since with so-called "drop flags", all variables do, technically, have Option-like properties; an additional flag^[1] that tracks initialization status, and a conditional destructor call at the end of their scope^[2].

But alas, it all makes much more sense now! The issue is: In order to let the caller handle the destruction (instead of the callee), C++ pretty much does the (moral) equivalent of

fn pass_on_the_box(x: &mut Option<Box<Foo>>) {
    other_function(&mut mem::take(x));
}

when calling with by-value arguments. And this also finally makes the produced assembly very comparable! (Essentially identical, actually.) (Removing the noexcept from @SkiFire13's example, because this reproduction in Rust handles the unwinding case, too. The version with noexcept could be compared with the Rust version compiled with -C panic=abort)

Still, I’m having a hard time figuring out any of the benefits of this approach.

I have found out a few things already, such as

temporaries are dropped at the end of the full expression, and by-value arguments are somehow also temporaries? Not sure how much of this is prescribed in the standard, so complexity w.r.t. temporaries and/or standard compliance might be issues
changing it now is clearly ABI-breaking, and…
…alternatively, introducing only the option of callee-destruction for certain types and/or arguments, would have surprising effects

possibly even many flags for partial moves ↩︎
at least this is the case as soon as the final state isn’t clear by static analysis ↩︎

ZiCog · December 15, 2024, 10:48am

Don't feel bad about it. No single human understands all of how C++ works or how its parts interact in strange ways. Not even Bjarne Stroustrup.

I pray Rust will not endlessly accrete complexity like that into the future.

simonbuchan · December 15, 2024, 11:41am

I believe I learned that from the same video and then promptly immediately forgot it again. Hopefully your memory survives longer than mine!

steffahn · December 15, 2024, 12:23pm

Editions help, but this is getting off-topic.

[this used to be a footnote, but it’s a bit long for Discource’s rendering style]

With editions, we can remove much complexity from the language, because backwards-compatibility concerns are much less limiting. Of course it doesn’t help in all cases, but it does in many.

One good example could be the current work on match ergonomics. The end goal is to make match ergonomics more intuitive and simple. Match ergonomics themselves were an addition to the original story of Rust patterns (where you would need to handle all references with & or &mut patterns and then decorate variables with ref or ref mut binding modes.

Then Rust got a fully backwards compatible update, “match ergonomics” that allows you to leave out those &Struct { field: ref x, .. } annotations and match s: &Struct directly against a Struct { field: x, .. } style pattern.

Turns out, that design wasn’t perfect, and has multiple flaws; some of them perhaps preventable, but much of the issue is the pretty high complexity of the exact rules; and especially the often surprising subtle consequences; the system works through implicitly tracking, a property throughout all patterns, which is called "default binding mode" (compare the Reference and the relevant RFC).

Now, the 2024 edition allows us a redesign to a simpler methanism … well maybe let’s wait with that determination until the exact design has been chosen … but the point is to make the behavior more intuitive, especially in many (semi-) corner cases. (I’m personally a fan of Nadrieril’s ideas/(unfinalized?)proposal in this context. I think there is a lot of value in finding an approach that –essentially– avoids the notion of "binding modes" entirely.)

[I personally wouldn’t be surprised long-term, if we can get rid of ref mut and ref patterns entirely. Just give more powers to references… if they could infer and track borrows, and perhaps a notion of “&move”-reference.]

Another example is 2024-edition changes to temporary scopes. It’s a change that might be a little bit breaking even despite edition support^[1] – but even when it’s not always completely smooth and automatic to migrate your code, most importantly all existing library code keeps working and can be imported without issues! And as long as this is ensured, even fairly fundamental changes can be made to Rust, especially – as is the case here too – if they serve to remove complexity from certain language rules, by working either towards simpler rules, or at least towards ones with a more intuitive effect.

(Last, but not least: The safety of Rust of course also means that in many areas, complexity is much less bad! The problem in C++ is that you the programmer are supposed to understand it all, how long each object lives, what steps are necessary to ensure thread-safety, where are the 10s or 100s of completely unnecessary extra ways in which you can achieve UB, like i++ + i++; or signed integer overflow; or something like 3? 6? 10? different ways one could initialize a variable with insanely arbitrary rules & interactions, especially wr.t. the effect of zero-initialization vs uninitialized.)

macros are always a bit hard with editions; and in this case, the migration isn’t perfect; certain code can’t be directly represented in edition-2024 code at all (yet?) ↩︎

khimru · December 15, 2024, 12:39pm

So very true. But since this is something I had to deal with very intimately, at some point… I can add some clarifications.

And, as usual with C++, you have missed some peculiar, but important details.

Which standard are you talking about? There are few relevant ones here.

Indeed.

This is something that's called, in the relevant standard, non-trivial for the purposes of calls and means precisely what is says on the tin.

Since you have used compiler that followed that standard and since having a destructor disqualifies type from being passed by value… you are observing what you are observing.

But if you compiler doesn't follow that standard (e.g. MSVC doesn't follow it) then object would be passed by value and it would be destroyed by the callee, not caller.

Yes. And that's why compilers that developed their ABI before C++ (and this before std::move) couldn't change their behavior: before rvalue references and std::move there was really no way to move object into the function, the most language could do is to copy it. And that's also what MSVC does if your type doesn't have a move constructor. Note that if you do have a move constructor then it's not even used! But the mere presence of move constructor gives compiler an option to move object and call the destructor in the callee function.

Sadly Itanium C++ ABI wasn't altered in time and compilers that follow it (means all UNIX systems, in practice) are stuck with this odd and peculiar behavior.

What surprising effects are you talking about? Option does exist in clang, but it's opt-in, because it breaks the compatibility.

Since MSVC is popular enough compiler most programs work just fine in that mode, too.

ZiCog · December 15, 2024, 12:46pm

Ah, I wonder if somebody can tell me why there is an "Itanium C++ ABI"?

Seems very odd to me given that Itanium does not exist and almost never exited. I, and nobody I know, have never seen one.

Also, I thought ABI's depended on what registers were available in processors. What does it mean to have an Itanium C++ ABI for x86, ARM, RISC V, whatever? For C++ or any other language?

khimru · December 15, 2024, 12:46pm

Could you? My understanding that what happened to C++ is pretty much inevitable and editions only help for the language users (because you can say that “crazy behavior” stays in the past and introduce “better modern behavior”) – and then only when they don't have to deal with crates compiled for old Rust editions.

Actual simplification of the language may only happen down the road when some editions wouldn't just be deprecated, but would fully removed.

Are there even plans to do that?

steffahn · December 15, 2024, 1:24pm

Ah, that was an observation stated in the FAQs of that video above; if I recall it correctly. I can’t personally judge whether it’s really that surprising, but essentially the effects can be that function arguments are destroyed in a weird order. (I’m not personally familiar with any canonical examples of C++ code where that order matters, so I can’t judge.)

Ah, I guess “complexity” is too vague. I understood @ZiCog’s reply about “endlessly accrete complexity” to mainly target complexity from the point-of-view of a language user. Though I can’t be sure.^[1]

Compilers are complex anyways…^[2] and you are of course correct that editions can’t remove any complexity from the language as a whole.^[3] I am not aware of any deprecation plans for old editions. So far, I’m not aware of any concerns at all, that they might be too much effort to maintain long-term.^[4]

A new syntax design, like [the parsing ambiguity fix & stronger initialization guarantees that came with] braces for constructors; or something like -> …-style return types on functions, wouldn’t be added as alternatives, but eventually replace the original syntax. For example, dyn Trait didn’t exist before 2018; but then with the 2021 edition, old trait object syntax is completely “removed”.

And to be fair, appeared in the context of destructors of by-value arguments; and that detail probably isn’t super relevant for most users (beyond the slight negative performance issues). ↩︎
any many rules of programming languages aren’t actually that complex; write them down properly and it shouldn’t really overwhelm a compiler author ↩︎
and indeed the old editions are still fully “part of Rust”; just the normal user who might still have some Rust-2015 code in a dependency, shouldn’t have to worry about Rust-2015 any more than about the C language for any of the C libraries that are being linked by his or her dependencies. ↩︎
Not surprising IMO, given that compilers are by design machines that translate (in multiple steps even) feature-rich surface language into more simple&uniform internal representations; editions fit right into this framework. ↩︎

khimru · December 15, 2024, 1:36pm

I can only answer about C++ because any other language would have to decide for themselves how and why they would adopt it.

Itanium C++ ABI is designed for C++, after all.

You are suffering from the post-knowledge. We know, in a year 2024, that Itanium would be a huge flop that would lead nowhere. But in year 1999 (and Itanium ABI was developed in 1999, right after ISO C++98 standard was published, scroll down the page to the revision history)?

It was a new hotness that was supposed to replace, literally, everything: Alpha, PA-RISC, PowerPC, SPARC, and x86, too! Heck, Windows XP 64-Bit Edition only works on Itanium, it doesn't work on x86-64. Sure, a few years later, in year 2005 Windows XP Professional x64 Edition would be released, but for a few years Operons had to use Linux if someone wanted a 64bit, because 64bit Windows was developed exclusively for Itanium and Microsoft still waited for that death of everything else!

Lots of things that we use today were developed exclusively for Itanium, in the beginning: EFI and GPT, among other things.

And since most Unix vendors planned to replace their CPUs with Itanium… they developed Itanium C++ ABI (C++ have got it's first standard just published, but it wasn't including ABI).

But they haven't started in vacuum, of course.

Sure, System V ABI supplements do that – but they only describe how C ABI works, because, well, in year 1983 when System V arrived C++ kinda-sorta haven't existed.

There are no “Itanium C++ ABI for x86”. “Itanium C++ ABI” is a supplement to “System V ABI”… developed specifically for Itanium, but it's delegates all the gory details of how arguments are placed in registers to the System V ABI - IA-64 Architecture Processor Supplement or to the System V ABI - Intel386 Architecture Processor Supplement or to the x86-64 psABI…

Because it's C++ ABI developed for Itanium-based system by a consortium of companies that were prepared to switch from their own proprietary architectures to Itanium… why wouldn't they call it “Itanium C++ ABI”?

Itanium was supposed to supplant everything… in that future (that never happened) nobody would ever wonder why official C++ ABI is called “Itanium C++ ABI”: it ABI for the only surviving CPU, why would it be called anything else?

Alas, history went in the other direction and now people are puzzled and surprised by that name… but it was pretty much an obvious name for what it was when it was developed.

khimru · December 15, 2024, 1:48pm

It's not really that surprising (after all MSVC does that already), it's just time to do that change was when rvalue references and move constructors were added: since pre-C++11 classes couldn't have move constructors any type that declares them should be ready to deal with the new order of calls to destructors.

But because that opportunity was missed and C++ doesn't want to break backward compatibility… yeah, that's precisely the type of change that Rust editions address.

Very-very close to how Rust have changed the drop order to support let chains.

Topic		Replies	Views
Rust heap pointer types (Box, Rc/Arc, Weak) vs C++ shared_ptr and friends	5	1409	January 23, 2023
Rust smartpointers for c# developer help	15	1090	January 12, 2023
Box and Rc and their equivalents in C++	16	3829	January 2, 2021
Confused between Box, Rc, Cell, Arc help	22	52720	July 3, 2022
Understanding memory management in Rust help	4	2576	January 12, 2023

Smart-pointers: C++ vs Rust

Related topics