I have C++ code which deconstructs std::string_view into its pointer and a length and passes it to a rust extern "C" function which uses std::slice::from_raw_parts to construct a slice from which &str will be built.
Is this safe to do? What are the common pitfalls I might fall into?
unsafe extern "C" fn with_string_view(
ptr: *const u8,
len: u64,
) {
let slice = std::slice::from_raw_parts(ptr, len as usize);
let str = std::str::from_utf8_unchecked(slice);
use_str(str)
}
fn use_str(x: &str) {
...
}
A few pitfalls that I see:
string_view is not required to be correct UTF-8. (Fixable and in my case is a given)
Memory comes from C++, so if somebody finds a way to get a mutable reference to it, this will cause UB. (Most likely this is abusable)
Can Rust references even point to memory allocated in C++?
I looked how others implemented this, and it seems like all of those implementations can cause UB, if somebody gets mutable access to the same string while references are alive.
Is there a safe way to do that at all? Would it require always copying a string?
This is also possible, but I am mostly worried about ownership rules, as it might be possible by safe code to get a mutable reference to the same memory.
And the next thing I am worried about is whether references are special and am I even allowed to make them to non-Rust allocated memory. It seems like yes, but I am not sure about rules there.
Which includes effectively, no mutating or deallocating until this function returns
UB to call with non-UTF8 data
Since you can't guarantee all the requirements, the function needs to stay unsafe, and it will be the responsibility of the caller (in whatever language) to uphold the requirements. Such as making sure safe code can't get a mutable reference to the same memory for the duration of the call.
If that's impossible to guarantee, you'll need to do something else.
That's fine if all the other requirements are met; what allocated the memory (if anything) doesn't matter.
References aren't special. They are just pointers with extra compile-time validity requirements. Specifically, the memory must be live and hold a valid instance of T for as long as &T or &mut T are live. You must also uphold the aliasing requirements: nothing mutates the referenced memory as long as &T is live, and no other active (=used in any way) pointers to the memory must exist as long as &mut T is live. Usual exceptions for UnsafeCell and UnsafePinned apply.
In general, these properties are impossible to uphold if you're dealing with C++ code, since C++ doesn't obey Rust's aliasing requirements and is quite lax with const-correctness. But if you don't call into C++ for the duration of your Rust function, or can guarantee that any calls do not violate the requirements for the slice, then all is well.
one other thing to consider is that for zero-length strings, you need to ensure the pointer isn't null, since C/C++ code often will pass null for zero-length buffers and rust &str references must be non-null. you can just return "" if the length is zero since rustc will give you a valid &str that way.