C interface to `std`

Sorry if this has been brought up before.

Would people in the C world appreciate access to the Rust standard library. It has some really nice things in it that are far better than what you get in C. An example is the collections (Vec, String, HashMap, BTreeMap). It would also mean that other crates could use rust types rather than having to convert to null terminated strings, for example. You would need to mark all structs "C" and/or use extra indirection so that rust-specific stuff is an opaque pointer in C land. The library would look something like

typedef void* std_vec_t; /* or use repr(C) on the vec struct */

std_vec_t std_vec_new();
std_vec_drop(std_vec_t);
void std_vec_push(std_vec_t, void*); /* vecs have to store pointer lists as we can't specify Rust generics from C */

/* and so on */

Food for thought.

4 Likes

This would work even better if you could guarantee Rust enums had a given tagged union layout. A #[repr(C)] for enums.

2 Likes

I think it is an interesting idea, I thought of that HashMap idea as well. In that regard, doing this would be quite easy thanks to the ::safer-ffi framework (it would be mainly a matter of shimming each and every API).

1 Like

There is, it was specified in RFC 2195.

2 Likes

For C++, a mini version of this exists in the CXX project:

https://github.com/dtolnay/cxx#builtin-types

The list of built in type support is currently quite small, but the goal is to allow using Rust types in C++ and vice versa.

2 Likes

Storing pointers instead of values would make the performance far worse, so the use cases would be very limited. A more viable, while still limited solution would be to generate FFI bindings for a predefined set of generic types (starting with all primitive integer and floating point types).

Hmm, it seems unfortunate that Rust can completely replicate and use C types with repr(C) but the indeterminate nature of Rust types prevents any by-value usage of standard Rust types in C. I know that ABI stability is not a thing, but even without it there ought to be a way when youʼre compiling your Rust and C code together, at the same time. E.g. the Rust compiler creating header files with accurate struct layout information at compile time.

Or is this perhaps already possible with tooling? For starters, representing only the size and alignment (both information that the rust compiler can probably somehow provide) correctly through some generated header file would already allow handling types by value and passing them to (extern "C") Rust functions from C, right?

1 Like

Some ABIs specify different ways of passing things based on the kind of the data (e.g. integers and floating point values passed in different registers), so no, I don't think this works for all types.

Right, it may be a bit more involved than just size and alignment. E.g. I’m also not sure how big of an issue uninitialized memory in padding is, or what an extern "C" fn with repr(Rust) arguments and/or return type currently even means. There seems to be an “`extern` fn uses type `...`, which is not FFI-safe” kind of warning in place. But IMO these problems ought to be solvable. Solvable without requiring ABI stability or having to let C know where exactly all the fields are in a type etc.

It's intentional that repr(Rust) does not have a stable ABI. Two different structures that in C would have the same layout can have different layouts in Rust. That permits the Rust compiler to optimize each layout independently based on
  1) actual observed usage of the different fields and field-groupings in the program, which interact with architecture-specific cache-line optimization, and
  2) features of the specific target instruction set (e.g., when field access at offset-zero is more efficient).

If you want a human-predictable layout in which all instances have the same layout, which would block the above classes of code optimization, use repr(C).

4 Likes

Yes it is unfortunate about the pointer situation. I'm not sure how to improve it.

This begs the following question: How, when std is pre-compiled, can types like std::vec::Vec be generic? When does the monomorphization happen?

It happens during the compilation of the crate that uses it.

How, when std is pre-compiled, can types like std::vec::Vec be generic? When does the monomorphization happen?

That's not how Rust/cargo work. std is not precompiled.

While it's a rather niche case, my first project in Rust made me decide to throw in my 2 cents.

TL;DR, I think it would be a great thing technically, but I don't know how many people would actually use it based on my own experience.

I learned Rust by taking a 13,000 line C library, doing an auto-generated translation to 100% unsafe code, and then cleaning it up to safe, idomatic Rust. I can't say I'd recommend it, but it sure taught me a lot! :laughing:

Around 4,000 of those lines were implementations of data structures similar to ones is the Rust standard library. They included:

  • dynamically reallocated arrays, akin to Rust's Vec<T>
  • linked lists
  • maps, implemented more like Java's LinkedMap<T> than Rust's HashMap<T>
  • heaps with arbitrary sort keys

Most of the library's data structures were built upon these "core" data structures, but it also exposed them as part of a "utility" API.

I got most of the way through getting rid of them, switching out their contents for a Rust data structures and emulating the API. But there were some things I could never emulate, like iterators with APIs that "cheated" based on the C data layout.

Aside from the converted version becoming slightly faster (somewhere between 2-3%), the test suite -- still written in C -- proved that it was easy to set up a Makefile to work with it. The Rust project simply had to generate a static library, and keep un-mangled C names that called the Rust functions.

It really was a drop-in replacement for the C original, in addition to providing a better Rust interface. However, I never finished it simply because I didn't see its value. I found another Rust library since which does what it was trying to do, and I wasn't sure anyone would use it in C besides its own test suite.

Most of the programmers I have worked with used C because of business reasons, e.g. using 3rd party proprietary toolchains for embedded devices. With the exception of Linux kernel developers (who already have their own data structures to use), everyone else who wrote performance sensitive code used C++ with STL.

Basically, IMHO, the intersection is very small of people who:

  • Write mostly in C
  • Will include a Rust toolchain in the CI and cargo in their Makefile
  • Don't have a library that implements these data structures already
  • Don't want to switch languages to something with an actual standard library

I'm not saying they don't exist. I'm saying I've never met or heard of them.

4 Likes

How would one access, say, the Vec functionality through a C interface?

AFAIK, it is precompiled into the rlib. That is, everything non-generic is compiled to the native code, everything generic - to some kind of compiler-internal representation, to be monomorphized later.

1 Like

This is how I imagined it must work, but i don't know the details.

asciicast

  • (FWIW, this demo is not even actually using the "proc-macros" feature. That being said, it is unusual to be using safer-ffi without calling #[derive_ReprC], which is what does require that feature)
3 Likes

That's soo cool.

1 Like

Here are some FFI examples: https://github.com/hg2ecz/FFI_Rust_and_C