Which data type to use for FFI call doing in-place character processing?

#1

I have a Windows DLL, created in Delphi, which exposes a function with a contract

procedure analyys (p : pchar; len : word); far stdcall external ‘ana.dll’;

As this is a “procedure”, result is written into the same area of memory as the first argument (it’s a nul-terminated CP1257 string). And size of this area is provided in the second argument.

And it seems that Rust has so many possibilities here that I really struggle to choose: array, Vec, CStr/CString, &str/String, slices, Box (?)…

It seems like an idiotic question, but I really could not google my way out of it here. So basically what I have right now is a Vec, returned by “encode” crate method, and I have to allocate 4096 bytes of area,

write those Vec bytes into the beginning of the area, zero-fill the rest, call DLL function, then copy first X bytes till the first 0 to Vec, pass it to “decode” crate method, turn into String, and return.

Currently I do it like this:

fn analyze_encoded(encoded_word: Vec<u8>) -> Result<String, String> {
    let mut text: [u8; 4096] = [0; 4096];
    let mut i = 0;
    for c in encoded_word {
        text[i] = c;
        i += 1;
    }
    const LEN: u16 = 4095;
    if let Err(e) = analyze_dll(&text[0] as *const u8, LEN) {
        return Err(e.description().to_string());
    }
    match WINDOWS_1257.decode(&text, DecoderTrap::Strict) {
        Ok(s) => Ok(s),
        Err(e) => Err(e.into_owned())
    }
}

I cannot check if the code works because of some weird cargo issue (https://github.com/rust-lang/cargo/issues/6754) but I still wonder, if this is actually correct?

I’ve seen copy_from_slice/clone_from_slice, but they seem to demand same size for both parties. So, how do I place content of a slice (Vec) on an array, and vice versa ?

#2

I can think of a few ways to do this. One would be to just use the encoded_word vec directly instead of having a separate scratch space at all:

fn analyze_encoded(mut encoded_word: Vec<u8>) -> Result<String, String> {
    // ensure that the `vec` is the right size and zero-padded
    encoded_word.resize(4096, 0);
    const LEN: u16 = 4095;
    if let Err(e) = analyze_dll(encoded_word.as_ptr(), LEN) {
        return Err(e.description().to_string());
    }
    match WINDOWS_1257.decode(&encoded_word, DecoderTrap::Strict) {
        Ok(s) => Ok(s),
        Err(e) => Err(e.into_owned())
    }
}

But if it’s important for the arguments passed to Delphi to be on the stack for some reason, then you just have to make sure that you’re copying between slices of equal length:

fn analyze_encoded(encoded_word: Vec<u8>) -> Result<String, String> {
    let mut text: [u8; 4096] = [0; 4096];
    text[0..encoded_word.len()].copy_from_slice(&encoded_word);

    const LEN: u16 = 4095;
    if let Err(e) = analyze_dll(text.as_ptr(), LEN) {
        return Err(e.description().to_string());
    }
    match WINDOWS_1257.decode(&text, DecoderTrap::Strict) {
        Ok(s) => Ok(s),
        Err(e) => Err(e.into_owned())
    }
}
#3

wow, super! I didn’t know of resize(). Also, what is it about being “on stack”? I guess the reference and length are passed via the stack in any case, right? Or do you mean that array is allocated on the stack ???

#4

Also, how do I take a slice of bytes from 1st to the one before the first 0 in an array/vec/slice? Is there some method/trait for that? Or just walk them one-by-one till I find 0 ?

#5

seems like something like this for the latter question: https://www.reddit.com/r/rust/comments/3y384i/reading_a_nullterminated_c_string_from_read/

#6

yeah, for the first question i meant allocated on the stack. But I imagine that doesn’t really matter.

As for the second question, you can use the position adapter on iterators:

fn main() {
    let mut a = [0u8; 10];
    
    a[0] = 42;
    a[1] = 69;
    a[2] = 255;
    
    // evaluates to `3`
    let first_zero = a.iter().position(|&x| x == 0).unwrap();
    
    println!("first zero is at index {}", first_zero);
}

But making a CStr should work fine too.

#7

But CStr is in UTF-8, no?

#8

Nope, a CStr is an abstration representing a sequence of non-nul bytes followed by a single nul byte (C strings convention). You could imagine that as being the same as a [u8] (+ the nul-invariant).

Since its size is dynamic, it can only be manipulated through a reference / pointer; usually a shared reference in this case: hence &CStr is the type often used to represent a view / a reference / a “pointer” to the aforementioned stream of bytes. In other words, &CStr is the “equivalent” of char const * in C.

Safety warning

If you are providing a pointer to Rust data / memory to be mutated by C, such pointer should be created from a &mut reference. This means using the .as_mut_ptr() instead of .as_ptr().

Failure to do so, as in the OP’s example, as well as @FenrirWolf’s example, is Undefined Behavior.

If the FFI API is badly designed and asks for a *const instead of *mut, you can always cast the return value of .as_ptr_mut() as *const _. Do not shortcircuit it to the falsely innocent-looking .as_ptr()

3 Likes