Rust C++ FFI suggestions

Hi there!

for performance reasons, I want to export an already existing C++ type to Rust so that I can transfer ownership from C++ to Rust and use it by-value in Rust (not behind a box/arc, that would come at a performance penalty). However, this type contains a C++ string_view and a std::variant with two types, with both types being movable in Rust terms. What's the best option to define such a type so that it can be created on C++ side, and then be moved to Rust so that it can be used as a "Rust native" type there? I already thought about several options:

  • Extend cxx to support string_view and variant. This seems to be a daunting task given the structure of cxx and the opinionated nature of this lib. In addition, for something like string_view, the bridge gen facilities of cxx are over-the-top; a simple library like libc (which is for C and POSIX bindings) would be enough. Does something like this exist for C++ standard types?
  • Use bindgen. Works for string_view, doesn't work for variant. The generator gets totally confused, most likely due to the variadic templates. Is there a way to make bindgen generate bindings just for the two-type case?
  • Implement custom bindings. This would require maintaining binding variants for each existing and supported C++ library, plus a way to reliably detect ABI changes, e.g. using static checks and enough tests. This could, however, be a community-driven undertaking (e.g. libcpp, see the first point) that would offer by-value semantics where possible and opaque-type-like support for other types.
  • Other options?

Looking forward to opinions!

easiest solution is probably to create an equivalent rust type make a set of c functions that read the original and give you it's part and then use those parts to assemble the rust one. if the c type is easi enough you might even be able to make a repr(C) rust type and just memcopy+trasmute one to the other

I couldn't help you there, but can outline amount of work needed. Crubit is trying to do what you are trying to do, the have few software developers working on the integration full-time but they don't even talk about ever supporting std::variant. You may see what they can or can not do today here. There are way too much subtlety in trying to support std::variant. Maybe, just maybe, you would be able to establish enough data to support your particular std::variant manually, but I don't think automatic conversion is in the cards any time soon. Sorry.

I would suggest the Rust side export a bindgen type and the C++ side doing the transform, perhaps using factory functions exported from Rust (though smuggling the output type via C will be a bit ugly),

Any way you do things you're going to be paying some level of translation cost since there's only a C shared ABI here, and it can't represent a natural version of either the C++ or Rust side of this. The question is if the cost is notable at all, which might depend on your context, but probably shouldn't matter at all so long as there's no allocations or O(n) copies.

Doing this manually would mean that I need to maintain N*M bindings, with N being the number of supported libc++ implementations and (just theory or also a real thing?) M configurations, as a library could decide to have a different ABI depending on the platform and compiler settings. This is something that I either like to avoid or at least share. Would you see a point in starting such a library or is Crubit the place where something like this is maintained in one place?

Yeah, I was also thinking about introducing a wrapper type that has a well-known ABI. But as you already acknowledged, this might incur runtime costs depending on the size of the value type of the variant. They might not be high in most cases (and maybe optimization steps can mitigate or nullify some of them) but there's no guarantee.

However, as long as no compiler intrinsic structures and methods are used (e.g. vtbl, rtti, constructors, operators, ...) it should be possible also for C++ types to be modeled in terms of a C ABI. And at least on libcxx and libstdc++, a variant is just a discriminator and a (nested) union of all the involved types. So if the types are moveable in Rust terms (or trivially_relocatable in C++ terms), a zero-cost by-value import to Rust should be feasible, or am I missing something?

Crubit is done by Google which, essentially, means they only care about last version of clang and nothing else.

I'm not using it, it was easier for us to just use extern "C" shims, in the end.

Yes, but there are insane amount of subtle corner cases. E.g. take just std::tuple. It should be returned like a regular struct with the same elements? Ha! Take tuple<float, float>. While struct with two float elements would be returned in XMM0 register, and std::pair works that way both with gcc and clangstd::tuple<float, float> is different.

And for bonus points you can try tuple<__m128, __m128>… here you would need to distinguish between clang 21+ and older versions of clang, too. And on old versions of clang you would need to know if you are using -mavx or not! Fun, isn't it?

Note that Google fixed bugs in clang 21+ which, hopefully, means that next version of Crubit would get some support for std::tuple… or maybe not, if more bugs would be found…

Sure. But that means you would have to exclude std::optional and std::expected, std::string and std::string_view, std:tuple and std::variant… maybe, just maybe, std::pair would work, but would you be able to explain, to the users, what is working and what is not working?

At this point accepting limitations of extern "C" is just easier…

Yes, but it also includes inheritance — and you saw what inheritance did for std::tuple.

You are missing the fact that inheritance can play badly with passing of arguments (as we saw with std::tuple<float, float>) and also there compiler bugs (as we saw with std::tuple<__m128, __m128>).

Heck, it took years to fix bugs even with extern "C"! The story with __int128_t is legendary. C++… if you only care about handful of compilers (or, ideally, one compiler, like Google) and have few engineers that you may assign to the task, then it may work. Otherwise… no. Not worth it.

If you really-really-really need few types to be passed by value do what Crubit is doing: write marshalling just for these few types, add enough tests for CI to ensure that everything works as you expect (don't forget about the fact that options like -mavx or -mavx512 may affect the ABI) and don't stray from these. std::tuple should be safe now, if you use clang 21+ and libc++, you can look on std::variant and see if it's usable or not…

1 Like

Or, in other words, there's no C++ ABI, so literally anything can change the layout at any time for any reason.

Including, in theory at least, optimization or code generation internals that you have no access to information about and ability to predict, including generating different layouts for the same definition in different parts of the code. (I don't think any compiler currently does anything quite so bad, but that is more a coincidence of it happens to make them more work and doesn't provide value than any consideration for people trying to do this)

That's not true, either. One example: why gcc passes std::pair<double, double> in XMM0 but std::tuple<double, double> in memory? Because libstdc++ version of std::tuple was born in as std::tr1::tuple before C++11 and had to employ certain hacks to be able to exist in a language without variadic templates. These caused it to be passed in memory and gcc kept it that way for compatibility.

On the other hand clang and libc++ don't do that, so that's why std::tuple<__m128, __m128> was able to go from one register to memory on CPUs with AVX: this was classified as a bug and fixed. Same story as with __int128_t.

Nope. That's absolutely not true. There are standards that explain how different things are supposed to be represented in memory and passed around. But there are bugs, too. So it's not as clear-cut as you want to portray.

E.g. __int128_t story was affecting extern "C", too, not just C++.

Compiler is absolutely, 100% forbidden to do that… but bugs happen. E.g. issue #43573 (that also affects extern "C", not just C++) was reported ages ago — still not fixed.

In case of C++ you have instability of definitions of types in libc++, that's true. But if you have a definition then you can predict what compiler would do… and handling of extern "C" is just very marginally easier than handling of C++ (I was really surprised to find that crazy story with std::tuple<__m128, __m128> and AFAIK that's the only bug in that area while extern "C" had two more… one fixed, one not fixed).

I guess it depends what you mean by ABI? GCC made a promise you could dynamically link across versions - often called "ABI compatibility", but when people say something like "the C ABI" they generally mean something like the sys-v ABI, where it specifically declares exactly how source language declarations is translated to machine level registers and offsets, or "documented ABI"

When I say there's no C++ ABI, I'm talking about the latter meaning; of course a compiler will internally know how it lays out it's own types and function signatures, and there are language rules that restrict that pretty severely, but the combination of the golden "as-if" rule and language rules against accessing private details of a type mean if it could show that two compilation subsets have no way to pass values between each other then they could use a different layout. Of course there's very little reason to do so, but it's more a point about just how little you're actually guaranteed.

In short, I'm not sure what you're referring to with

since to my knowledge, there's no such effort that attempts to cover C++ definitions, which would be what was needed in this case.

C++ borrows that part from C. It still uses the exact same psABI that C uses.

Yes, but anything that can be accessed via ELF entry points have to follow psABI conventions, functions that are not exposed in extern function can use arbitrary ABI… but these couldn't be called from outside so that's not a problem.

Practically speaking the only thing that's not set in stone are pointers to members. These different from compiler to compiler. Almost everything else is fixed because of compatibility with C (note that in C++ land any C++ function can be passed to extern "C" function as callback… this limits what can you do extremely severely).

Open the documentation for psABI and you would see that it includes C++ section and even Fortran section. Add Itanium C++ ABI to it and almost everything at the language level is fixed.

Standard library is not covered, of course, that's different kettle of fish.

Huh, seems like there's been a lot of movement pretty much immediately after I stopped paying attention! :sweat_smile: I've somehow managed to not hear "psabi" in the last decade of occasionally dipping into C++.

Historically, at least, this was only true for the parts that overlapped with C. The adoption of Itanium's spec of all things over a decade after that died is quite the surprise, if a pleasant one. Though I'll bet anyone on MSVC is still boned as usual...

if by "Includes C++ section" you mean literally the text:

For the C++ ABI we will use the IA-64 C++ ABI and instantiate it appropriately. The
current draft of that ABI is available at:
C++ ABI

and nothing else, sure. I'd raise more of an eyebrow if it wasn't for that draft claiming that it's portable over an underlying C ABI (though I feel like surely there's some stuff that's architecturally specific?)

That adoption happened in gcc 3.0 that was released almost quarter century ago.

Story with MCVS is different: theoretically they reserve the right to break ABIs in a major MSVC release… but last major MSVC version, _MSC_VER 1900 was released a decade ago.

Yes, some parts are architeture-specific and they mention it separately: look for the Unwind Library Interface and some others.

It's not just incorporation of Itanium C++ ABI by reference (as I have said: in practice compilers used it for quarter century anyway, it's just explicit reference added to psABI that's new), there are other things.

But none of these cover standard library, these are all strictly about C++ language, not about it's standard library…