Are we approaching C/C++ interop the wrong way?

During an ongoing patch review on the Linux Kernel Mailing List, there's been some grumbling about C<->Rust interop due to boiler-plate wrappers that have security implications. One suggestion has been to use clang to parse the C/C++ code and inject it in the IR together with the Rust IR parsed by rustc. Is it a realistic idea ? Would it be a better solution than bindgen and friends ?

2 Likes

Looking at the first reply chain (which is what I assume you're referring to), it seems like the complaint is more that the patch is introducing a wrapper function around a C macro that involves inline assembly.

i.e. the complaint is that Rust has no way to directly ingest arbitrary C constructs and inline them. Someone made a comment that some such constructs could be rewritten as Rust, but that would then require all changes in the C version to be precisely replicated on the Rust version.

(In other comments: Rust is a toy language because it doesn't support ASM goto, it's a toy language for not supporting memops in inline assembly, the syntax sucks, using a third common language for common definitions is madness, thank goodness I can retire before I'm forced to use this, etc.)

*deletes long rambling paragraph about the "just ingest C" argument, due to lack of knowledge on the subject*

12 Likes

There are various complaints in the first chain, but what intrigued me was the second link. AFAIU, the trivial wrapper is needed because the wrapped function is inlined in C, but Rust can only FFI-call a non-inlined function.

There are a few languages (D, Swift, Zig...) that boast seamless interop with C/C++, and I've seen many people wishing the same for Rust. I just don't know whether it's a good idea that requires a lot of work, or a bad idea because X, Y and Z.

Would embedding C in Rust (in a similar way to how we embed assembly) solve the "calling inlined code" usecase above ? Would it cause an unacceptable dependency on a C/C++ compiler ? Is there a better, less intrusive way to solve this ?

1 Like

C(++) being what is is, removing the friction is not a great idea. It's a bit like unsafe: the friction there is purely artificial and serves 2 purposes:

  • stop and think about what you're going to do
  • easily find code to review thoroughly
4 Likes

As a compiler ignoramus and armchair spectator it seems to me that C and C++ are very different languages despite their constantly being bracketed together. Any discussion of interoperability is very different with C and C++.

C is a very simple language and is very easy to interface with. What with simple functions, pointers and structs. No name mangling. A common ABI. Easy for every language to deal with, and they do. I have seen it written that " C Isn't A Programming Language Anymore" C Isn't A Programming Language Anymore - Faultlore The argument being that nobody wants to program in C anymore but C provides the interoperability "glue". I would not go that far but get the point.

Meanwhile C++ is a huge a complicated language. Should Rust seamlessly interoperate with C++ classes, virtual methods, multiple inheritance, templates, concepts, modules, whatever monster of complexity comes out of WG21 next?

I suspect the effort would be huge and it would have a devastating effect on the Rust language itself, even if was all confined to "unsafe".

The notion of using clang to achieve the iterop is interesting. But it seems to me that it takes Rust further down the road of LLVM dependance. What are the GCC guys going to do with it? Or anyone else?

15 Likes

Yes, despite their common origin they are now quite different. But they share more or less the same footguns (lots of UB without a word from the compiler).

1 Like

Indeed, the same foot guns, and C++ has added many surprises of it's own.

But not to worry, when it comes to interoperating with foreign code one has already decided to pass through the air-lock of "unsafe" no matter what language is on the other side. Even if the other language were as memory safe as Rust it could not convince Rust of that fact.

1 Like

Reminds me of the unholy mess that is objc++ in gcc, one issue (that I imagine could also be one for rust + c) is what output for debug-info when your symbols are comprised of multiple languages. I'm uncertain if the objc++ compilers just messed this up by ascribing "objc++" as the dwarf language, or if dwarf actually has a mechanism for describing sub-symbol language. Even if it does, I highly doubt any debugger actually supports it in practice. That was in gcc a single compiler reusing frontends. I can't imagine it'll be easier to do right when you're running separate rust and clang compilers. So i'm curious if this is a problem with the languages that were cited as well.

1 Like

There is the DW_LANG_ObjC_plus_plus value for the DW_AT_language attribute.

Which tells you exactly nothing about whether a statement is a c++ statement, or a objc statement, how or whether to demangle an objc++ symbol as c++ or objc. That is IMO the problem. It should probably keep DW_LANG_ObjC_plus_plus, while keeping intact the DW_LANG_ObjC, and DW_LANG_C_plus_plus dwarf generated by each individual frontend. Reflecting the nested language that it is.

Anyhow I guess my main point really is I haven't seen from any of these languages mixing languages at the AST level any debugging related effort, maybe it is being done somewhere, maybe it isn't I haven't followed lldb. But it undoubtedly causes issues in that regard.

1 Like

There is a single unified grammar where you can mix C++ and ObjC expressions within a single statement afaiu. Just like ObjC allows C expressions and ObjC expressions freely mixed.

If it starts with a + or - it is an ObjC method and if it starts with _Z it is a C++ function or method. Otherwise it is a C function. You already have this mixing problem with plain ObjC and plain C++. Both can define C symbols.

2 Likes

People who portray themselves as ignorant despite good documentation being out there (not to mention resources like URLO being available i.e. it's possible to get answers to directed questions) deserve to have their voice ignored, because it's pretty much willful ignorance at that point.

Quite likely it would require (access to) an actual C compiler, yes. C used to be simple, and while it's still simpler than C++ by a wide margin, in 2023 it hasn't been simple for at least 10 years.

I think that's already more so the case for kernel devs than random devs. It's kind of how the space operates, given what a kernel kan do relative to userspace programs.
So all your proposal would end up doing is infuriating those kernel devs, relegatibg Rust to a 2nd class citizen.
How would that affect Rust adoption, in the kernel in particular as well as in general?

Well you're not wrong there. The starting point for C++ was essentially C + OOP, but since then it's metastasized into something far larger, uglier, maddening, more dangerous, internally inconsistent, less comprehensible and an all-round (security) liability. Not that I would expect Stroustrup to agree, of course.

1 Like

Yeah, it is fine from that perspective, the debugger could write an expression parser for it.
The problem is more how to interpret the dwarf when e.g. when you see something which behaves differently between the languages, such as a DW_TAG_class_type is it a c++ class or an objc one.

We should expect problems when we turn something like language which has always been injective, into something which is not a function.

Indeed it has been long enough that I have looked at this and still pre-coffee. But indeed you are right. I remember there are problems struggling to remember exactly what they are. Edit: I remember the issue here being that gdb actually supports 3 different mangling schemes for objc (NeXT, GNU old, and GNU new), some of these are ambiguous since they are limited to c symbol strings, Apple also has theirs with '+', '-'. So it takes care not to demangle them, only mangle them and compare the mangled output which makes things interesting.

I don't follow. None of those comments relate to ignorance, and none of them would be addressed by documentation. The first two appear to be genuine current technical limitations of the compiler of particular concern to Linux kernel devs. The next two all subjective opinions, and the last is generalised disdain.

1 Like

I think the issues with the Linux kernel are:

  • Linux isn't in C, it's in "C with gcc extensions"
  • The kernel includes preprocessor macros that need to be expanded inline in C functions (see here)
  • The kernel has weird security requirements having to do with Spectre/Meltdown mitigations

I don't think Rust can ever support a fully blown asm goto because it would need a fully blown goto first, and the borrow checker probably makes that impossible. It also comes with a considerable cost (my understanding is that using this feature disables several optimization passes). But likely something much weaker is all that's necessary.

3 Likes

I'm confused about "memops." Is the issue that in C we can typically constrain an x86 operand as a register, memory operand, etc.? You can write memory operands in Rust inline asm but it's pretty awkward IIRC.

I was referring to ignorance w.r.t. certain technical aspects of Rust.
For example, ASM goto not being present was IIRC a purposeful omission i.e. there likely was a good reason for it when that decision was made.

It's not reasonable to expect everyone to know everything of course, but that's why I was referring to resources like URLO: any of those opinions could have been "battle-tested" (so to speak) by asking a couple of questions, as opposed to what they seem to have done i.e. just assuming something and running with that.

1 Like

The article should be taken with a little more importance. One never interfaces with C but rather with some implementation of C, at each compilers whim of whether it provides much more sensible and precise rules around ABI than the C specification requires. (And, by this we also imply the existence of a function mapping rustc targets to some source-of-truth implementation of C defining, which seems a bit of a stretch) Of course, this doesn't work at scale in practice..

So, basically, considering this feature is:
broken on clang in MSVC compat mode
broken in rustc in #[repr(C)] while compiling for MSVC
broken in MSVC itself during C++ interop
not really clear whether it's the right thing to do for clang and gcc
Maybe we should not look into fixing this at all,
@oli-obk repr(C) is unsound on MSVC targets · Issue #81996 · rust-lang/rust · GitHub

And wasm32 is experiencing its own sort of pain of repr(C) and "implementation-defined", too.

Of course, C++ made it worse, but it makes it so bad that no-one even tries to achieving anything all-encompassing and libraries can relatively comfortably defer to relying on llvm for parsing and producing type layouts. Not that this works flawlessly, but it doesn't typically give wrong impression to careless users either.

2 Likes

So let me get this straight....

The C language has no ABI specified in its standard. So given any compiled C function in an object file, for example, one has no idea how parameters are passed in and out or where return values might be if one wants to call that function. These details are down to the implementation of the compiler/linker that produced the executable code of the function. They may well be different on every processor architecture (there is no way out of that), every compiler/linker, every operating system.

In the face of that it seems impossible for any other language, even assembler, to use that C function without knowing how some compiler implementation built it. The ABI in use.

Meanwhile, one might have built that C function on Windows with MSVC, say, but of course ones Rust code is built with LLVM. One has to pray they are both working to the same ABI.

It's amazing any of this ever works....!

It's kind of amazing that given the low level, "systems programming", nature of these languages, especially C which is often called a glorified assembler, that these fundamental details of how systems actually work are not nailed down rigorously by the languages.

They both ignore "the system" and leave things undefined.

Is there something wrong with this #[repr(C)] thing?

I mean it's OK as far as it goes, it fixes data layout in structs to be C compatible. But that is only a small part of the problem.

Don't we need something like this on functions and the like:

#[abi(name = "abi-name")]

After all, it is not C we are linking with it's executable code conforming to some ABI.

1 Like

Indeed. Some is unavoidable of course (different architectes have different number of registers, some are little endian, some big, possibly different alignment requirements, ...). But then there are standards, such as the SYSV AMD64 ABI (System V ABI - OSDev Wiki) . But Windows does it's own thing of course.

That would probably be useful for projects like wine, where there is code that need to use windows calling conventions on one end and OS calling conventions on the other. IIRC gcc even supports some function attributes specifically for this purpose.

1 Like