While continuing to work on my FFI Rust Library I learned 2 things:
- You can only correctly interchange Memory if it is allocated by the Host Application. The function
foo_get_text()
demonstrates how to extract the content of Foo.stext
by the Host Application.
- the variable
itextlen
is actually the Buffer Length of the string and must contain space for the NULL Byte.
#[no_mangle]
/// # Safety
///
/// pstext must not contain a NULL Byte before itextlen and have at least itextlen + 1 capacity
pub unsafe extern "C" fn foo_new(pstext: *const u8, itextlen: u32) -> Box<Foo<'static>> {
let slice = slice::from_raw_parts(pstext, itextlen as usize);
let cstr = CStr::from_bytes_with_nul_unchecked(slice);
match cstr.to_str() {
Ok(s) => {
// Here `s` is regular `&str` and we can work with it
Box::new(Foo { stext: s })
}
Err(_) => {
// handle the error
Box::new(Foo { stext: &"" })
}
}
}
#[no_mangle]
pub extern "C" fn foo_get_text(opfoo: Option<&Foo>, psbuffer: *mut u8, ibufferlength: u32) -> i32 {
match opfoo {
Some(f) => {
let vstxt = if f.stext.as_bytes().len() < ibufferlength as usize {
f.stext.as_bytes()
} else {
&f.stext.as_bytes()[0..ibufferlength as usize - 1]
};
let cstxt = unsafe { CString::from_vec_unchecked(vstxt.to_vec()) };
unsafe {
libc::strcpy(psbuffer as *mut i8, cstxt.as_ptr());
}
//Return Content Length
vstxt.len() as i32
} //Some(f)
//Return -1 as Error Code
, None => { -1 }
} //match opfoo
}
#[no_mangle]
pub extern "C" fn foo_print(opfoo: Option<&Foo>) {
if let Some(f) = opfoo {println!("foo text: '{}'", f.stext)}
}
#[no_mangle]
pub extern "C" fn foo_delete(_: Option<Box<Foo>>) {}
When itextlen
is set to the String Length like with strlen()
then only itextlen - 1
becomes part of the Rust Structure:
$ ./demorustfoo
Foo: building ...
Foo: input Content 'Foo Text Content 0123'.
Foo: built with Content 'Foo Text Content 0123'.
Foo: built.
Foo: printing ...
foo text: 'Foo Text Content 012'
Foo: printed.
Foo: get Text ...
Foo: got Text returned '20'.
Foo: got Text Content 'Foo Text Content 012'.
Foo: deleting ...
Foo: deleted.
The example shows that the Input Content "Foo Text Content 0123" is not changed by the Rust Library but only the a part of it like itextlen - 1
is referenced by the Rust Structure Foo.stext
like "Foo Text Content 012"
Another interesting observation is that the Rust Structure Foo.stext
keeps pointing to the buffer memory even after the pointer was dismissed by the Host Application:
$ ./demorustfoo
Foo: building ...
Foo: input Content 'Foo Têxte Contént utf-8 0123'.
Foo: built with Content 'Foo Têxte Contént utf-8 0123'.
Foo: built.
Content disposing ...
Content [Length '0']: Text ''.
Foo: printing ...
foo text: 'Foo Têxte Contént utf-8 012'
Foo: printed.
Foo: get Text ...
Foo: got Text returned '29'.
Foo: got Text Content 'Foo Têxte Contént utf-8 012'.
Foo: deleting ...
Foo: deleted.
which might be risky because this memory could be reassigned to another variable during the application life time.
Also remarkable is the *mut u8
to *mut i8
cast which is required by the libc::strcpy()
function.
Since i8::MAX
has its limit at 127
Rust Documentation i8
Data Type
I was wondering how it would handle UTF-8 text which contains values > 128 to represent Multi Byte Characters
We see the combinations of "195:170 - > '234/0000EA'" and "195:169 -> '233/0000E9'"
$ echo "Foo Têxte Contént utf-8 0123" | ../text*/text*.run -i -d
input: reading ...
chunk: cnt: '31' bytes read
chunk: cnt: '0' bytes read
input: cnt: '31' bytes read
input: cnt: '31'; 'Foo Têxte Contént utf-8 0123
'
input: done.
70:'F'|111:'o'|111:'o'|32:' '|84:'T'|195:170:'234/0000EA':'':|120:'x'|116:'t'|101:'e'|32:' '|67:'C'|111:'o'|110:'n'|116:'t'|195:169:'233/0000E9':'':
rs no rpl (cnt: '31'): resize to -> '62'
|110:'n'|116:'t'|32:' '|117:'u'|116:'t'|102:'f'|45:'-'|56:'8'|32:' '|48:'0'|49:'1'|50:'2'|51:'3'|10:'
'|read: done.
result (cnt: '45'): 'Foo T(?0000EA)xte Cont(?0000E9)nt utf-8 0123
'
output: writing ...
chunk '1/45/45': write go ...
Foo T(?0000EA)xte Cont(?0000E9)nt utf-8 0123
chunk: cnt: '45/45/45' bytes written
output: done.
Surprisingly the string data was copied without corruption.
Foo: printing ...
foo text: 'Foo Têxte Contént utf-8 012'
Foo: printed.
Foo: get Text ...
Foo: got Text returned '29'.
Foo: got Text Content 'Foo Têxte Contént utf-8 012'.