I am working on a Rust / C binding and have to convert between C strings of the type char*
to String
and back. Basically the situation on Stackoverflow or in the forum.
My intention here is to understand, not how to make it work.
First: Why does Rust use u8
instead of i8
, that would match the common C idiom of using char*
?
I presume because u8
matches the underlying type, a text in utf-8. Signedness does not make much sense in the context of a text character - and I agree.
So we can go from String to Vec<u8>
, borrow mutably and get a pointer to it, which always leaves us with *const u8
or *mut u8
. However, C commonly uses char
instead of unsigned char
, which requires me to do something like:
use std::ffi::CString;
fn main() {
let s = String::from("Hallo Welt!");
let cs = CString::new(s).unwrap();
let cv: Vec<u8> = cs.into_bytes_with_nul();
let mut tmp: Vec<i8> = cv.into_iter().map(|c| c as i8).collect::<_>(); // line 7
let _cptr: *mut i8 = tmp.as_mut_ptr();
}
To summarize:
-
String
toCString
to add theNUL
termination -
CString
toVec<u8>
so we can iterate over bytes -
Vec<u8>
toVec<i8>
so we can get achar
later; also needVec
to get a mutable reference to its contents
(I know there is CString::into_raw() -> *mut c_char
, which does that all in one, but I'd have to reclaim the memory later, which does not work for me at the moment.)
I understand that passing a *const T
instead of *mut T
is dangerous:
Rust may correctly presume that the contents is unchanged, when in fact the C routine changed it (which it might anyways, disregarding the const
qualifier). Also string literals might land in a read-only memory segment, so changing const
to mut
might segfault.
Second, however, why is a type cast from *mut u8
to *mut i8
dangerous in any way?
In the Forum post u/ExpHP writes it is undefined behavior. AFAICT "undefined behavior" is a matter of "the Rust compiler team defining it to be so" - and I am fine with that.
Why is line 7 from above better than:
use std::ffi::CString;
fn main() {
let s = String::from("Hallo Welt!");
let cs = CString::new(s).unwrap();
let mut cv: Vec<u8> = cs.into_bytes_with_nul();
let _cptr: *mut i8 = cv.as_mut_ptr() as *mut i8; // typecast here!
}
What could possibly go wrong?
Thanks in advance!