Bindgen / FFI to C++ map: invalid memory reference

Is bindgen unable to create bindings compatible with a C++ std::map? (Disclaimer: I'm a rust novice and know no C++, just a hobbyist learning as I go.)

I thought I was making a tiny bit of progress creating FFI bindings for TaskWarrior, and almost immediately found that I get errors like (signal: 11, SIGSEGV: invalid memory reference) when trying to instantiate a Context.

With a bunch of print debugging (I'm still trying to learn the ropes of rust-lldb), I narrowed down the problem to a line that sets a key:value in a C++ class which inherits from std::map<std::string, std::string>.

I thought perhaps the inheritance and use of (*this) was the issue (I've come across several blog posts suggesting that inheriting from std types is perhaps not a good approach), so I tried to make a MRE to see if composition vs inheritance made any difference: https://github.com/n8henrie/brokencppmap/tree/a9cadbcd7cab24fed9381e360b7da227f17ceb43 -- it doesn't; as soon I call the method to set the value in the map I get a crash. Running the c++ code from the toy main.cpp I wrote works as expected. (EDIT 20211127: updated the link to refer to the original commit.)

To save people the click on the MRE repo, hopefully this is the meat of the matter:

void Configuration::set_inherited() {
    (*this)["foo"] = "bar";
}

void Configuration::print_inherited() {
    auto found = find ("foo");
    std::cout << found -> second << std::endl;
}

void Configuration::set_composed() {
    submap["baz"] = "qux";
}

void Configuration::print_composed() {
    auto found = submap.find("baz");
    std::cout << found -> second << std::endl;
}

Both the main.cpp and fn test_map() just call set_{inherited,composed} and then the corresponding print_ with the --nocapture flag; from rust, it crashes in between the calls to set_ and print_.

My Googling is not returning many results, but I've seen several warnings in the bindgen docs that there is limited support for C++; is that what I'm what I'm running up against? Is it time for me to delve into something like CXX?

Thanks as always for the help.

Generally speaking, when you want to create bindings to C++, it's not a good idea to expose std types in your headers since libc++(clang, whose headers bindgen uses) and libstdc++(gcc) are not binary compatible. Running cargo test does indeed show several failing tests related to generated layouts on my system (WSL). It might work on a MacOS though.
You can still use std types in your source files (.cpp).
I cloned the repo and used a pointer to implementation technique and it now runs correctly:

diff --git a/src/lib.rs b/src/lib.rs
index c5c935e..01b842c 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -8,8 +8,8 @@ mod tests {
 
     #[test]
     fn test_map() -> Result<()> {
-        let mut config = Configuration::default();
         unsafe {
+            let mut config = Configuration::new();
             dbg!("before");
             // config.set_inherited();
             config.set_composed();
diff --git a/vendor/proj/proj.cpp b/vendor/proj/proj.cpp
index 163c2b4..e049a65 100644
--- a/vendor/proj/proj.cpp
+++ b/vendor/proj/proj.cpp
@@ -1,20 +1,22 @@
 #include "proj.h"
+#include <map>
 #include <iostream>
 
-void Configuration::set_inherited() {
-    (*this)["foo"] = "bar";
-}
+struct Configuration::ConfigImpl {
+    std::map<std::string, std::string> map;
+};
 
-void Configuration::print_inherited() {
-    auto found = find ("foo");
-    std::cout << found -> second << std::endl;
+Configuration::Configuration() : impl(new ConfigImpl) {}
+
+Configuration::~Configuration() {
+    delete impl;
 }
 
-void Configuration::set_composed() {
-    submap["baz"] = "qux";
+void Configuration::set_composed() {;
+    impl->map["baz"] = "qux";
 }
 
 void Configuration::print_composed() {
-    auto found = submap.find("baz");
+    auto found = impl->map.find("baz");
     std::cout << found -> second << std::endl;
 }
diff --git a/vendor/proj/proj.h b/vendor/proj/proj.h
index 40ce2e0..1d349f5 100644
--- a/vendor/proj/proj.h
+++ b/vendor/proj/proj.h
@@ -1,13 +1,12 @@
-#include <map>
-#include <stdlib.h>
-#include <string>
+#pragma once
 
-class Configuration : public std::map <std::string, std::string>
+class Configuration
 {
+    struct ConfigImpl;
+    ConfigImpl *impl;
     public:
+        Configuration();
+        ~Configuration();
         void set_composed();
         void print_composed();
-        void set_inherited();
-        void print_inherited();
-        std::map<std::string, std::string> submap;
 };

Doing so makes it also easier for bindgen, less code to go through since you no longer include std headers, so less points of failure.

Regarding the cxx crate, it supports certain C++ std types (string, vector, unique_ptr and shared_ptr). I don't think it supports other containers at the moment.

3 Likes

Thanks for the patch! (EDIT 20211129: Pulled into a branch here)

I'm on an M1 Mac on Monterey; I can confirm tests pass and I get the expected output with this patch.

Let me take a look and see if I can apply the lessons learned to the original problem. Thank you!

I'm still struggling here unfortunately; @MoAlyousef's solution clearly works but depends on modifying the code in vendor/, whereas I'm hoping to create bindings that work for a project whose source I don't control.

Looking at the diff, it seemed like an important part of refactoring the C++ was making it more straightforward to initialize the struct from Rust.

I thought this might be a good time for me to learn about MaybeUninit; to me, it looks like the C++ code may be crashing because it declares an uninitialized class (that inherits from std::map) and then uses a method to set a value (which is where things crash in Rust). Since it runs as expected in C++, I was hoping that following a similar pattern from rust -- let foo = MaybeUninit::uninit(), and then using Configuration_set_inherited(foo.as_mut_ptr()); let foo = foo.assume_init(); would allow C++ to do the initialization. Unfortunately it crashes all the same, with the same error as above (invalid memory reference). Here's my most recent attempt.

Perhaps I need to (learn how to) write a C wrapper. Marking these to look through tomorrow:

Thanks for mentioning this. I removed the stdlib include and will keep this in mid going home. Also added .opaque_type("std::.*") which seems to be recommended in a few threads here and at SO.

Ok, I found a really helpful blog post and was able to write my first attempt at a C wrapper.

I have no idea if I'm doing things well, poorly, or what the safety ramifications are, but the tests are printing the expected output -- progress!

The wrapping code can be simplified even further:

// wrapper.h
#ifdef __cplusplus
extern "C" {
#endif

// it's an opaque type for all C cares
typedef struct Configuration Configuration;

Configuration *newConfig();
void delConfig(Configuration *c);
void setInherited(Configuration *c);
void printInherited(Configuration *c);
void setComposed(Configuration *c);
void printComposed(Configuration *c);

#ifdef __cplusplus
}
#endif
// wrapper.cpp
#include "wrapper.h"
#include "vendor/proj/proj.h"

Configuration * newConfig() {
    return new Configuration;
}

void setInherited(Configuration *c) {
    c->set_inherited();
}

void printInherited(Configuration *c) {
    c->print_inherited();
}

void setComposed(Configuration *c) {
    c->set_composed();
}

void printComposed(Configuration *c) {
    c->print_composed();
}

void delConfig(Configuration *c) {
    delete c;
}

Normally you would try to avoid names like config_t, _t suffixes are reserved for posix and you might run into symbol collision.
Another thing is the names of the wrapping functions. These are functions which are available in the global namespace, so you would try to give them names like Configuration_new, Configuration_setInherited etc. And if you were exposing them as part of a library, then you can also prefix the library's name (or initials), all to avoid collisions.

1 Like

Wow, that's a thing of beauty. Thank you so much!

I was a little discouraged at how complicated this seemed with the style I found above, but this really seems approachable now.

Here's another post I found that follows this style: https://www.teddy.ch/c++_library_in_c/

Updated repo: https://github.com/n8henrie/brokencppmap/tree/bde690efe19b86f692d746b0411c96b63f529fd5

1 Like

Since you asked about CXX: yes it is very well suited to this kind of binding. It avoids the boilerplate that you did manually in your wrapper.cpp. Here is the diff, which is +53 –97.

1 Like

Wow, thanks so much for the patch!

I pulled it into a CXX branch: https://github.com/n8henrie/brokencppmap/tree/a5e655c46e8c18dcca3d970f6cc41eb6874ace25

That is really nice how much it cleans up the build.rs and eliminates wrapper.cpp.

Side note: it's funny to me that the first response to your question about considering CXX was:

when in reality it's bindgen that can't cope with std::map being in the header that it processes, while CXX has no issue with it as shown in the commit.

How would CXX deal with std::map in the interface, i.e functions taking or returning a std::map?

CXX handles arbitrary types. The list you gave is types for which CXX ships a high-quality builtin binding i.e. an idiomatic Rust API for you on the Rust side, but something doesn't need a builtin binding in order for you to use it. The vast majority of types you'd want to use do not have a builtin binding; for example the class Configuration in @n8henrie's repo is not a type that CXX would know about, yet the commit shows them using it via CXX. There isn't anything different about how you'd use std::map vs Configuration.

This example code on the website shows working with a C++ JSON library built on std::variant, including getting a std::map in Rust from a JSON object, performing a lookup in it, and printing out the value.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.