Bindgen / FFI to C++ map: invalid memory reference

Is bindgen unable to create bindings compatible with a C++ std::map? (Disclaimer: I'm a rust novice and know no C++, just a hobbyist learning as I go.)

I thought I was making a tiny bit of progress creating FFI bindings for TaskWarrior, and almost immediately found that I get errors like (signal: 11, SIGSEGV: invalid memory reference) when trying to instantiate a Context.

With a bunch of print debugging (I'm still trying to learn the ropes of rust-lldb), I narrowed down the problem to a line that sets a key:value in a C++ class which inherits from std::map<std::string, std::string>.

I thought perhaps the inheritance and use of (*this) was the issue (I've come across several blog posts suggesting that inheriting from std types is perhaps not a good approach), so I tried to make a MRE to see if composition vs inheritance made any difference: https://github.com/n8henrie/brokencppmap/tree/a9cadbcd7cab24fed9381e360b7da227f17ceb43 -- it doesn't; as soon I call the method to set the value in the map I get a crash. Running the c++ code from the toy main.cpp I wrote works as expected. (EDIT 20211127: updated the link to refer to the original commit.)

To save people the click on the MRE repo, hopefully this is the meat of the matter:

void Configuration::set_inherited() {
    (*this)["foo"] = "bar";
}

void Configuration::print_inherited() {
    auto found = find ("foo");
    std::cout << found -> second << std::endl;
}

void Configuration::set_composed() {
    submap["baz"] = "qux";
}

void Configuration::print_composed() {
    auto found = submap.find("baz");
    std::cout << found -> second << std::endl;
}

Both the main.cpp and fn test_map() just call set_{inherited,composed} and then the corresponding print_ with the --nocapture flag; from rust, it crashes in between the calls to set_ and print_.

My Googling is not returning many results, but I've seen several warnings in the bindgen docs that there is limited support for C++; is that what I'm what I'm running up against? Is it time for me to delve into something like CXX?

Thanks as always for the help.

Generally speaking, when you want to create bindings to C++, it's not a good idea to expose std types in your headers since libc++(clang, whose headers bindgen uses) and libstdc++(gcc) are not binary compatible. Running cargo test does indeed show several failing tests related to generated layouts on my system (WSL). It might work on a MacOS though.
You can still use std types in your source files (.cpp).
I cloned the repo and used a pointer to implementation technique and it now runs correctly:

diff --git a/src/lib.rs b/src/lib.rs
index c5c935e..01b842c 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -8,8 +8,8 @@ mod tests {
 
     #[test]
     fn test_map() -> Result<()> {
-        let mut config = Configuration::default();
         unsafe {
+            let mut config = Configuration::new();
             dbg!("before");
             // config.set_inherited();
             config.set_composed();
diff --git a/vendor/proj/proj.cpp b/vendor/proj/proj.cpp
index 163c2b4..e049a65 100644
--- a/vendor/proj/proj.cpp
+++ b/vendor/proj/proj.cpp
@@ -1,20 +1,22 @@
 #include "proj.h"
+#include <map>
 #include <iostream>
 
-void Configuration::set_inherited() {
-    (*this)["foo"] = "bar";
-}
+struct Configuration::ConfigImpl {
+    std::map<std::string, std::string> map;
+};
 
-void Configuration::print_inherited() {
-    auto found = find ("foo");
-    std::cout << found -> second << std::endl;
+Configuration::Configuration() : impl(new ConfigImpl) {}
+
+Configuration::~Configuration() {
+    delete impl;
 }
 
-void Configuration::set_composed() {
-    submap["baz"] = "qux";
+void Configuration::set_composed() {;
+    impl->map["baz"] = "qux";
 }
 
 void Configuration::print_composed() {
-    auto found = submap.find("baz");
+    auto found = impl->map.find("baz");
     std::cout << found -> second << std::endl;
 }
diff --git a/vendor/proj/proj.h b/vendor/proj/proj.h
index 40ce2e0..1d349f5 100644
--- a/vendor/proj/proj.h
+++ b/vendor/proj/proj.h
@@ -1,13 +1,12 @@
-#include <map>
-#include <stdlib.h>
-#include <string>
+#pragma once
 
-class Configuration : public std::map <std::string, std::string>
+class Configuration
 {
+    struct ConfigImpl;
+    ConfigImpl *impl;
     public:
+        Configuration();
+        ~Configuration();
         void set_composed();
         void print_composed();
-        void set_inherited();
-        void print_inherited();
-        std::map<std::string, std::string> submap;
 };

Doing so makes it also easier for bindgen, less code to go through since you no longer include std headers, so less points of failure.

Regarding the cxx crate, it supports certain C++ std types (string, vector, unique_ptr and shared_ptr). I don't think it supports other containers at the moment.

3 Likes

Thanks for the patch! (EDIT 20211129: Pulled into a branch here)

I'm on an M1 Mac on Monterey; I can confirm tests pass and I get the expected output with this patch.

Let me take a look and see if I can apply the lessons learned to the original problem. Thank you!

I'm still struggling here unfortunately; @MoAlyousef's solution clearly works but depends on modifying the code in vendor/, whereas I'm hoping to create bindings that work for a project whose source I don't control.

Looking at the diff, it seemed like an important part of refactoring the C++ was making it more straightforward to initialize the struct from Rust.

I thought this might be a good time for me to learn about MaybeUninit; to me, it looks like the C++ code may be crashing because it declares an uninitialized class (that inherits from std::map) and then uses a method to set a value (which is where things crash in Rust). Since it runs as expected in C++, I was hoping that following a similar pattern from rust -- let foo = MaybeUninit::uninit(), and then using Configuration_set_inherited(foo.as_mut_ptr()); let foo = foo.assume_init(); would allow C++ to do the initialization. Unfortunately it crashes all the same, with the same error as above (invalid memory reference). Here's my most recent attempt.

Perhaps I need to (learn how to) write a C wrapper. Marking these to look through tomorrow:

Thanks for mentioning this. I removed the stdlib include and will keep this in mid going home. Also added .opaque_type("std::.*") which seems to be recommended in a few threads here and at SO.

Ok, I found a really helpful blog post and was able to write my first attempt at a C wrapper.

I have no idea if I'm doing things well, poorly, or what the safety ramifications are, but the tests are printing the expected output -- progress!

The wrapping code can be simplified even further:

// wrapper.h
#ifdef __cplusplus
extern "C" {
#endif

// it's an opaque type for all C cares
typedef struct Configuration Configuration;

Configuration *newConfig();
void delConfig(Configuration *c);
void setInherited(Configuration *c);
void printInherited(Configuration *c);
void setComposed(Configuration *c);
void printComposed(Configuration *c);

#ifdef __cplusplus
}
#endif
// wrapper.cpp
#include "wrapper.h"
#include "vendor/proj/proj.h"

Configuration * newConfig() {
    return new Configuration;
}

void setInherited(Configuration *c) {
    c->set_inherited();
}

void printInherited(Configuration *c) {
    c->print_inherited();
}

void setComposed(Configuration *c) {
    c->set_composed();
}

void printComposed(Configuration *c) {
    c->print_composed();
}

void delConfig(Configuration *c) {
    delete c;
}

Normally you would try to avoid names like config_t, _t suffixes are reserved for posix and you might run into symbol collision.
Another thing is the names of the wrapping functions. These are functions which are available in the global namespace, so you would try to give them names like Configuration_new, Configuration_setInherited etc. And if you were exposing them as part of a library, then you can also prefix the library's name (or initials), all to avoid collisions.