Passing strings back and forth between luajit and rust

I'm trying to pass a string from luajit to rust and return a new string but have been unable to get it to work fully. I also suspect the code will have plenty of room for improvement. Help with this would be greatly appreciated!

I've looked at rust-ffi-examples/luajit-to-rust/src at master · alexcrichton/rust-ffi-examples · GitHub and Returning an owned string · Issue #2 · shepmaster/rust-ffi-omnibus · GitHub for inspiration.
My incomplete implementation looks like this right now:

# Rust
extern crate libc;
use libc::c_char;
use std::ffi::CStr;
use std::str;

#[no_mangle]
pub extern fn greet_from_rust(subject_c_ptr: *const c_char) -> *const u8 {
  let subject_c = unsafe {
      assert!(!subject_c_ptr.is_null());

      CStr::from_ptr(subject_c_ptr)
  };
  let subject = str::from_utf8(subject_c.to_bytes()).unwrap();

  let mut new_string: String = "hey, ".to_owned();
  new_string.push_str(subject);

  new_string.shrink_to_fit();
  let new_string_ptr = new_string.as_ptr();
  std::mem::forget(new_string);
  return new_string_ptr;
}
-- lua
local ffi = require "ffi"
ffi.cdef [[
  const char* greet_from_rust(const char* subject);
]]
local greet_from_rust = ffi.load('../rust/target/release/libgreet_from_rust.so').greet_from_rust
local result = greet_from_rust("peter")
local greeting = ffi.string(result, 9)
print(greeting) -- successfully prints "hey, peter"

However, I was not able to replace the return type *const u8 with *const c_char and lua does not know of the length of the new string. Maybe we could introduce a struct with two members, the string and its length?

Also, who now owns new_string? Do luajit and its GC dispose of it later?
Can we make this code any faster by not copying everything, but working on the C string/char* directly?

1 Like

It looks like you're almost there. At the very end, you might instead use CString, which will take care of getting you a *const c_char and will do the mem::forget for you. e.g.,

let cstring = CString::new(new_string);
cstring.into_raw()

This transfers ownership from Rust to LuaJIT. However, since the memory was allocated by Rust, it must also be deallocated by Rust. So you'll also need a "free" function that takes the string returned above and frees it, e.g.,

#[no_mangle]
pub extern fn greet_free(s: *mut c_char) {
    unsafe { CString::from_raw(s); }
}

And that's it.

If you're returning a *const c_char, then you're basically saying, "I want to deal with C strings." C strings are nul-terminated and carry no length information. If LuaJIT uses a different style (i.e., ptr + length), then you should probably stick with those. You certainly might consider defining a C struct for that, or just passing everything as function parameters, e.g.,

#[no_mangle]
pub extern fn greet_from_rust(message: *const u8, message_len: size_t, new_message: *mut *const u8, new_message_len: *mut size_t) {
   // ...
}

Where the "inputs" to the function are the first two parameters and the outputs are the latter two parameters.

Which style you choose depends on the problem you're trying to solve, the magnitude of your API (IMO) and various other things probably.

If you have a *char, then it seems like you're limited to either shrinking it or changing it in place, but increasing its size in place seems like folly unless you have a priori knowledge about the size of allocation. It's hard to say more without knowing more about the problem you're trying to solve.

1 Like

Thanks for your feedback. I've continued but hit two walls. The first wall was hit when cstring.into_raw() complained that

src/lib.rs:27:11: 27:19 error: no method named `into_raw` found for type `core::result::Result<std::ffi::c_str::CString, std::ffi::c_str::NulError>` in the current scope
src/lib.rs:27   cstring.into_raw()

Leaving that aside for the moment, I like your idea of passing everything in as function parameters. However, I hit the next wall trying to implement it. I'm assuming it's something obvious why the assignment fails, but I cannot figure it out.

extern crate libc;
use libc::c_char;
use libc::size_t;
use std::ffi::CStr;
use std::str;

#[no_mangle]
pub extern fn greet_from_rust(message: *const c_char, message_len: size_t, new_message: *mut *const u8, new_message_len: *mut size_t) {
  let subject_c = unsafe {
      assert!(!message.is_null());
      CStr::from_ptr(message)
  };
  let subject = str::from_utf8(subject_c.to_bytes()).unwrap();

  let mut new_string: String = "hey, ".to_owned();
  new_string.push_str(subject);

  new_string.shrink_to_fit();
  new_message = new_string.as_ptr(); # <-- this fails with "expected `*mut *const u8`, found `*const u8`"
  std::mem::forget(new_string);
}

I realize it's probably a beginner's question, but any help would be greatly appreciated!

This is because CString::new can return a Result. You'll need to handle the possible error it can return. You can see the documentation here: CString in std::ffi - Rust

If you need help with error handling in general, then the book has a chapter on it.

new_message has type *mut *const u8 and new_string.as_ptr() has type *const u8. They are incompatible types---you can't assign one to the other.

Just like in C, though, *variable is an lvalue, which means it can be assigned to. *new_message in particular has type *const u8, which is what you want. Therefore, *new_message = new_string.as_ptr() should work.

1 Like

Of course, I forgot to unwrap. CString works now for the above version.
Btw, why are would I pass in the message_length, if I cannot use it in CStr::from_ptr?

Putting it all together, I get the below. Unfortunately, lua prints gibberish. From the rust side of things, is there anything missing or is it Lua that's doing something incorrect?

# rust
extern crate libc;
use libc::c_char;
use libc::size_t;
use std::ffi::CStr;
use std::ffi::CString;
use std::str;

#[no_mangle]
pub extern fn greet_from_rust(message: *const c_char, message_len: size_t, new_message: *mut c_char, new_message_len: *mut size_t) {
  let subject_c = unsafe {
      assert!(!message.is_null());

      CStr::from_ptr(message)
  };
  let subject = str::from_utf8(subject_c.to_bytes()).unwrap();

  let mut new_string: String = "hey, ".to_owned();
  new_string.push_str(subject);

  new_string.shrink_to_fit();
  let cstring = CString::new(new_string).unwrap();
  let ptr = cstring.into_raw();
  unsafe {
    *new_message = ptr;
    *new_message_len = new_string.len();
  }
}
-- lua
local ffi = require "ffi"
ffi.cdef [[
  void greet_from_rust(const char* message, size_t message_len, char* new_message, size_t* new_message_len);
]]

local ot = ffi.typeof "char[?]"
local sz = ffi.new "size_t[1]"
local bf = ffi.new(ot, 10)

local librust = ffi.load "../rust/target/release/librust.so"
librust.greet_from_rust("peter", 5, bf, sz)
local greeting = ffi.string(bf, sz[0])
ngx.print(greeting) -- should print "hey, peter", instead prints random gibberish

You wouldn't. If you're using CStr, then that means you're using C strings, which are nul terminated and don't carry explicit length information.

It's been a long time since I've used Lua's ffi, so I'm not familiar with its details. If it doesn't use C strings but instead uses a pointer to some bytes and a length, then you probably shouldn't be using CStr or CString at all. Instead, you'll want to create a slice of u8 from your pointer/length with std::slice::from_raw_parts, and you can then create a &str from there with std::str::from_utf8.

For all intents and purposes, "Lua doing something incorrect" is exceedingly unlikely.

I don't know what's wrong, but it looks like ffi.string(bf, sz[0]) means Lua doesn't use C strings, so you shouldn't be using CStr/CString at all.

Fair enough, so let's return to *const u8. To reduce the complexity, I have also hard-coded the length of the new string.
The output is still gibberish. Recalling the first piece of code I posted, when we return the string, everything works. When we use pointers, we get gibberish. Might that be the root of the problem?

pub extern fn greet_from_rust(message: *const u8, message_len: size_t, new_message: *mut *const u8, new_message_len: *mut *const size_t) {
  let message_slice = unsafe {
      assert!(!message.is_null());

      slice::from_raw_parts(message, message_len as usize)
  };
  let message = str::from_utf8(message_slice).unwrap();

  let mut new_string: String = "hey, ".to_owned();
  new_string.push_str(message);

  new_string.shrink_to_fit();
  unsafe {
    *new_message = new_string.as_ptr();
  }
  std::mem::forget(new_string);
}
local ffi = require "ffi"
ffi.cdef [[
  void greet_from_rust(const char* message, size_t message_len, char* new_message, size_t* new_message_len);
]]

local ot = ffi.typeof "char[?]"
local sz = ffi.new "size_t[1]"
local bf = ffi.new(ot, 10)

local librust = ffi.load "experiments/rust/target/release/librust.so"
librust.greet_from_rust("peter", 5, bf, sz)
local greeting = ffi.string(bf, 10)
print(greeting) -- should print "hey, peter", instead prints gibberish

You might be conflating *mut *const u8 with *const u8 on the Lua side. For example, you're passing bf to greet_from_rust, but you're passing the same bf to ffi.string. The former should be a pointer to a string (so a *mut *const u8) whereas I'd expect the latter to just be the string itself (so a *const u8).

1 Like

Yes, that is what I am trying to do. I am not at all familiar with C syntax, I thought that char* new_message was doing declaring a pointer and that *new_message = new_string.as_ptr(); was changing the value of that pointer to point at new_string, also updating the value of bf in lua. Clearly, that's not happening, so how can I make it so?

I've experimented with other approaches, but passing a buffer back to lua is still my ideal solution. Unfortunately, I'm quite stuck, so any help here would really help me out. I'd love to get this working and integrate rust into my app.

The Rust and Lua function signatures do not match:

  1. On the Rust side, new_message is a double indirection (pointer to pointer to char), on the lua side it's just a single indrection (pointer to char). To me, the Rust side looks correct.
  2. On the Rust side new_message_length is also a double indirection (pointer to pointer to size_t) on the lua side again only a single indirection (pointer to size_t). In this case, the lua side seems correct.

For strings I usually use a different kind of interface though (at least for FFI):

  • Allocate on the lua side
  • Pass the buffer to the rust function
  • Rust function writes into the buffer and returns the length written

To find out the required buffer size, first call the function with NULL as buffer parameter, the function then returns the required buffer size.

That way the ownership of the allocated memory is always on the right side of the FFI boundary, you don't have to mem::forget and you don't have to remember where to free the memory.

That kind of interface is a very interesting direction. The only down-side is that there is one more FFI back&forth to obtain the length before passing the then allocated buffer.
Anyways, I'd like to try it out and benchmark it. Am I correct in understanding that we need to copy the contents of the already allocated String new_string into the output buffer *mut u8 new_message? My naive implementation is below - can it be improved?

pub extern fn greet_from_rust(message: *const u8, message_len: size_t, new_message: *mut u8, new_message_len: *mut size_t) {
  assert!(!message.is_null());
  assert!(!new_message.is_null());
  assert!(!new_message_len.is_null());
  let message_slice = unsafe { slice::from_raw_parts(message, message_len as usize) };
  let new_message_slice = unsafe { slice::from_raw_parts_mut(new_message, 10 as usize) };
  let new_message_len_slice = unsafe { slice::from_raw_parts_mut(new_message_len, 1 as usize) };

  let mut new_string: String = "hey, ".to_owned();
  let message = str::from_utf8(message_slice).unwrap();
  new_string.push_str(message);

  let new_string_length = new_string.len();
  new_message_len_slice[0] = new_string_length;

  let new_string_bytes = new_string.into_bytes();
  for i in 0..new_string_length {
    new_message_slice[i] = new_string_bytes[i];
  }
}

There's no need to allocate at all inside the function:

...
let prefix = "hey, ";
new_message_slice[..prefix.len()].clone_from_slice(prefix);
new_message_slice[prefix.len()..prefix.len() + message_slice.len()].clone_from_slice(message_slice);

(Disclaimer: written without compiler aid and untested...)

EDIT: it's clone_from_slice, not copy_from_slice

1 Like

Perfect, it's all working now. Thanks!