Using Unsafe for Fun and Profit


#1

TL;DR: I’ve been working on something of a Rust FFI guide in my spare time and now I’m looking for feedback (rendered/github)

At work in an effort to clean up the codebase and remove a lot of the existing memory bugs, I’ve written a couple Rust libraries which can be imported as DLLs by the main program. There wasn’t as much information on the net about Rust and FFI as I’d like (besides the basics of how to call C/C++ code), and a lot of it was scattered across various blog posts and forum threads… So I thought I’d write down a couple of the lessons I’ve had to learn the hard way so that others don’t have to.

This is very much still a work in progress, however if there are things you’d like to know about Rust and FFI, feel free to create an issue on github and I’ll try to add a page for it. Likewise, if you spot any errors or think I’m doing something hideously wrong/unsafe then I really want to hear from you!

While the FFI section of The Book is fantastic and I learned a lot from it, it doesn’t (and arguably, shouldn’t) go into all the low level details of writing a Foreign Function Interface and bindings from other languages. The idea is this will be a tutorial for more advanced users who may want to use Rust their alongside existing codebase.

Let me know if there are any comments, complaints, questions or queries.


#2

Ha, that’s interesting, I talked to a friend about writing such a thing on Friday. You’ll definitely see me there at some point.

I’ve got a complaint though: You are “Using unsafe for fun and profit”, not abusing it. I hate it when “abuse” is used as a stand-in for “advanced/creative” usage. Two reasons:

a) Abuse is something terrible and negative - it’s generally considered out of bounds.
b) you are working right within the bounds of the system. Calling it “abuse” de-ligitimizes that you are actually doing something very intended and great! Calling a guide something different shies away users searching for advanced content, because you imply that this is somewhat out of bounds.


#3

You have a good point about the use of “abuse”, however in the process of learning how to do FFI in a reasonably safe way I had to do quite a few cringe-worthy things to try and make everything work.

A really good example of this is when I was using a boxed trait object. Basically you might have a bunch of different types which implement a Shape trait so you can have a heterogeneous collection of Shapes which are allocated on the heap. In the destructor, typically you’d get your raw pointer, reconstruct a Box from it, then let Rust drop it naturally. Thing is, you can’t create a box from a pointer to an unsized type or check if it’s a null pointer with some_ptr.is_null() so I had to do a couple unholy things to make things work how I wanted them to.

I’ve since found a much nicer solution, but that’s where the “abusing” part came from anyway. I think I’ll change it to “using unsafe for fun and profit” though, that sounds a lot better.


#4

Sure, but just because something currently can’t (and probably never will) be solved in a simple fashion doesn’t mean you are working out of bounds :). Just the solution isn’t simple and needs a lot of understanding (and a book to read along!).

It’s precisely those things I talked about last Friday.

See you on the issue tracker!


#5

Thanks! This was an interesting read!
I like how you spend time explaining how to embed rust into other languages, since I feel that is how rust will “sneak into” existing projects. More sharing on that topic really helps intermediate programmers like myself!

One thing I was wondering while reading: would switching Rust’s allocator help or hinder?

You list two alternatives of dealing with memory

  1. each language allocates and frees its own bits, guest-programmer must provide destructors that the host calls.
  2. host language provides buffers, guest language limited to stack-only. (No allocations).

This makes sense to me, we shouldn’t let the guest allocator free allocations tracked by the host allocator, and vice versa, because that messes up the internal bookkeeping of both allocators.

I see an option 3: If you’d use the system allocator in rust, would you be able to free() a rust-struct from C? Is that a valid option, or am I overlooking something?

I guess the accounting burden on the programmer becomes bigger (who-frees-what suddenly is no longer the trivial A-or-B answer from above), but not bigger than it would have been in pure C…

Question two: could we somehow make option 2 safer by using #[no_std], because that tells rust that there is no allocator that we can accidentally use. Or is that overkill?


#6

I’m not sure to be honest, I believe when you compile your crate as a cdylib it’ll automatically use the system allocator, but I’ve never thought to look into that. That’s definitely something I should add a page for!

My personal feeling is that going down this path, while making things easier, makes the programmer’s job a lot more difficult. You go from having a precise rule which says “only free something from the language it was allocated in”, to a situation where you need to remember who frees what on a case-by-case basis. That’s just my opinion though, and it might easily change if I tried it out.

One concern is that a lot of types in Rust would have warnings similar to the one under CString::into_raw().

The pointer must be returned to Rust and reconstituted using from_raw to be properly deallocated. >Specifically, one should not use the standard C free function to deallocate this string.

Failure to call from_raw will lead to a memory leak.

This is definitely an option that I’ll look into. I’ve played around with #[no_std] a little bit when I was putting Rust on an STM32 microcontroller. If rust doesn’t use an allocator, how do you pass a struct back to another language? You can’t pass back a pointer to something on the stack in the Rust function, because when the Rust function returns that stack frame gets popped and your struct goes away. That’s why I’ve been allocating on the heap with Box and passing back a pointer to that box.

Note that this isn’t exactly true. The caller can easily malloc their own buffer and pass it to you. For example, something like this is just as valid as creating an array on the stack:

int main() {
  char *buffer = malloc(sizeof(char)*10);
  
  get_version_string(buffer, 10);
  printf("Version: %s\n", buffer);

  free(buffer);
}

I should make that more obvious though, cheers for pointing it out.


#7

Just to make my intentions absolutely clear: my questions are intended as a learning experience for me, I feel I still lack the actual experience to provide constructive criticism. Thank you for the time you’ve taken to write up your hard-earned lessons of experience!

That’s exactly what I suspected, I am glad that sensible people agree :slight_smile: : not worth the extra headaches for the lib-users.

I think that is what I intended… you force all allocations to happen in the host language, because the guest language allocator is disabled. My “no allocations” was intended to apply only for the guest language.

Exactly: you handicap yourself to host-provided buffers only. But the more I think about it, the more that sounds overly impractical. Why take away your option of leaking stuff to the host language?

A final thing that occurred to me, after sleeping on this last night: I didn’t see a single mem::forget() in your code. This seemed pervasive in other tutorials I’ve seen on this topic.
I understand that box::into_raw() and Cstring::into_raw() take care of forgeting for these types, but shouldn’t it be introduced for ‘raw’ structs?


#8

The main reason I tried to avoid doing an explicit mem::forget() is because Box::into_raw() effectively does this under the hood. Box::into_raw() also helps you continue on with Rust’s idea of ownership. Even though you’re passing around a raw pointer, conceptually you’re passing ownership of some struct back to the callee. Just with the caveat that they’ll need to pass ownership back to some destructor so it can be deallocated.

If you think of it from that perspective then using mem::forget() doesn’t make sense. You aren’t trying to forget the thing because you’re actually passing ownership of it to whichever language called that function… I hope that kinda makes sense.

Additionally, because I’m allocating everything on the heap I’m able to use Box::into_raw() and Box::from_raw() for everything. Even then, if the caller passed in some uninitialized struct that I then write data to, it’s allocated on the stack so you don’t need an explicit mem::forget(). That happens automatically when the caller’s stack frame gets popped.

I guess you could say I feel like mem::forget() is a hack to get around the ownership rules, whereas using Box::into_raw() allows you to pass ownership of some heap-allocated struct back to the caller, via raw pointers. I know all of this is unsafe code, but I feel like it’s less unsafe to use the into_raw/from_raw method.


#9

Thank you!
If I understand you correctly, into_raw() is more meaningful with regard to the intention, even if under the hood it does a forget() (but that’s just an implementation detail).
into/from_raw is obviously a translation to/from a more basic representation, whereas forget is a “Huh, why’d they do that? why forget when we want to give it away? Shouldn’t we keep it around?

Effectively: It’s the same reason why we should name our classes “dbConnectionPool” and not “Barney”. Even if the functionality is the same, the former is way easier to reason about for the poor human.

I have decided I like this about your tutorial: Rather than my previous reads, which were all “lets open the hood and see how the black magic works”, yours is more “OK, here’s how to get stuff done in a productive way that will make sense when you have to debug it two months from now”.


#10

Ah the lovely C language! It’s actually fairly common for a call to return allocated memory which you must take special case to deallocate (might not be appropriate to just call free).


#11

@stevedonovan you make a good point there. I’ve used C code where it’s the caller’s job to deallocate memory once you’re done, but I don’t think I’ve done it enough to know whether it’s a good practice or not. Would you say that’s something I should promote when talking about how to design your library to work well with other languages? I know having to manually call a destructor for everything can be a bit of a pain, so maybe its easier to let the user call free on data…

Also, can you think of any other things I should mention? A C++ programmer friend of mine mentioned memory management being really important, and you can’t use variadic functions like printf()directly. But we couldn’t really think of any other gotchas to look out for when doing FFI or interacting with C.


#12

The classic example would be having to use fclose if you have done a fopen - basically, explicit destructors. But it’s all over the place: after opendir, have to call closedir and so forth. Really is the Wild West! But with C, how else could one do it, if the returned memory is non-trivially organized, or requires other resource cleanup?


#13

One thing I would mention is that when statically linking against a C library from Rust, you still have to generate position-independent code (-fPIC) when compiling the static C library.