CString::into_string 's behavior on Android?

Hi there,
I've got a CString problem which occurs on Android ABI. Codes locate here, same as below:

use std::ffi::CString;

const PR_GET_NAME: libc::c_int = 16;

fn main() {
    unsafe {
        let cs = CString::from_vec_unchecked(Vec::with_capacity(32) as Vec<u8>);
        let ptr = cs.into_raw();
        let ret: libc::c_int = libc::prctl(PR_GET_NAME, ptr);
        if ret != 0 {
            println!("prctl failed: {}", ret);
            return;
        }
        let c_string = CString::from_raw(ptr);
        match c_string.into_string() {
            Ok(s) => {
                println!("Return: {}, debug print: {:#?}, len: {}.", s, s, s.len());
            },
            Err(_) => println!("into_string failed.!"),
        };
    }
}

On my Linux x86_64, the output is ALWAYS correct, let's say the thread name is demoarm64:

Return: demoarm64, debug print: "demoarm64", len: 9.

On the aarch64-linux-android (NDK-r17c, API-Level 26), the output looks confuse:

Return: demoarm6?, debug print: "demoarm6\u{0}", len: 9.

I use question mark for the unprintable character. This difference doesn't happen when the thread name(CString) contains letters only.

I used CString in a wrong way or something else? Any information will be helpful, thanks!

I'm guessing libc::prctl() overwrote the 4 in demoarm64 with a null character, and that's what's messing things up.

When you use CString::from_raw() it'll take the pointer, assume it came from CString::into_raw() and calculate the correct address of the original CString, complete with the length field.

The problem is prctl() moved the end of the string back one byte without updating the CString's length, so when we call CString::into_string() it'll try to convert length bytes to UTF-8 and because a null is a valid UTF-8 character it'll complete without any errors.

That's my hypothesis anyway, you may need to do a little experimenting to see what is actually going on. If you've got access to a debugger, even better.

I believe a CString should be an allocation that exactly matches the underlying c string. Try something like this:

use std::ffi::CStr;
use std::os::raw::c_char;

const PR_GET_NAME: libc::c_int = 16;

fn main() {
    unsafe {
        let mut allocation = Vec::with_capacity(32);
        let ptr = allocation.as_mut_ptr() as *mut c_char;
        let ret: libc::c_int = libc::prctl(PR_GET_NAME, ptr);
        if ret != 0 {
            println!("prctl failed: {}", ret);
            return;
        }
        
        let c_str_slice = CStr::from_ptr(allocation.as_ptr() as *const c_char);
        
        match c_str_slice.to_str() {
            Ok(s) => {
                println!("Return: {}, debug print: {:#?}, len: {}.", s, s, s.len());
            },
            Err(_) => println!("into_string failed.!"),
        };
    }
}

The above doesn't tie the lifetime of the CStr to the vector. You can do that like this.

Thanks for the reply.

According to your suggestion, I use CString::into_bytes() instead of CString::into_string() and create String with String::from_utf8, found something more interesting.

    let c_string = CString::from_raw(ptr);
    let bytes = c_string.into_bytes_with_nul();
    println!("Bytes: {:#?}", bytes);
    match String::from_utf8(bytes) {
        Ok(s) => {
            // s.retain(|c| c.is_alphabetic());  // to bypass this problem.
            println!("Return: {}, debug print: {:#?}, len: {}.", s, s, s.len());
        },
        Err(e) => println!("into_string failed: {}!", e),
    };

The Linux output is always same with \0 byte ended.

Bytes: [
    100,
    101,
    109,
    111,
    97,
    114,
    109,
    54,
    52,
    0,
]
Return: demoarm64, debug print: "demoarm64\u{0}", len: 10.

But the android's output is ended with random byte, and String creation will failed:

/data/local/tmp # ./demoarm64
Bytes: [
    100,
    101,
    109,
    111,
    97,
    114,
    109,
    54,
    192,
    224,
]
into_string failed: invalid utf-8 sequence of 1 bytes from index 8!
/data/local/tmp # ./demoarm64
Bytes: [
    100,
    101,
    109,
    111,
    97,
    114,
    109,
    54,
    192,
    32,
]
into_string failed: invalid utf-8 sequence of 1 bytes from index 8!

@Alice's solution is the correct one. If libc::prctl() is copying a string to some location you specify and doesn't actually care about the current contents, then you should create a suitably sized byte buffer (e.g. vec![0; 256]) and pass a pointer to it (my_buffer.as_mut_ptr()) to the function.

Afterwards, you use CStr::from_ptr() on your buffer to interpret it as a C-style string and can convert it to Rust from there.

Sounds like there's a bug in CString's implementation. Maybe an off-by-one or something specific to the android architecture. You should create an issue against the Rust repo with your example.

This:

let cs = CString::from_vec_unchecked(Vec::with_capacity(32) as Vec<u8>);

will internally push a null-byte to the vector, and then call into_boxed_slice on the vector. This conversion will use realloc to shorten the allocation to one byte, so it is not surprising that the data after that one byte is behaving weirdly, as you are now writing to deallocated memory.

You are also likely corrupting your heap when the CString is deallocated at the end of main, as you are giving the deallocate method the wrong length for the allocation.

2 Likes

Thanks, I've used CString::from_vec_unchecked in a wrong way.
It returns a CString with the actual length of Vec + 1 null byte under its inner Box slice. The memory after that is out of bound.

And I do suffer the weird heap corruption in my project :joy:.

I'm wondering why it doesn't happen in the Linux?

It's possible that the memory allocator ignores the length argument on Linux in malloc/free style, so it doesn't matter that you gave it the wrong length.

I see, maybe the allocator doesn't give it back to system for such small memory.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.