CString::from_raw danger?


#1

So… I always figured that the reason why CString::{into_raw,from_raw} exist (and are the only way to work with *mut c_char) is because foreign code might replace one of the interior characters with a NUL. I figured they did some evil trickery behind the scenes like store the original length of the allocation at a fixed negative offset from the pointer.

I dunno. I didn’t really think much about it… until today, when I really began wondering why CString::from_vec_unchecked is unsafe, and why the from_raw documentation mentions that “the length” is recomputed. What length, I wondered? Surely not the length of the allocated byte slice?

pub unsafe fn from_raw(ptr: *mut c_char) -> CString {
    let len = sys::strlen(ptr) + 1; // Including the NUL byte
    let slice = slice::from_raw_parts_mut(ptr, len as usize);
    CString { inner: Box::from_raw(slice as *mut [c_char] as *mut [u8]) }
}

…yep. It recomputes the length of the allocated byte slice.

Which suggests to me that the following code is actually UB:

use ::std::ffi::CString;

fn main() {
    // Allocate a CString of 14 bytes (including the NUL)
    let ptr = CString::new(b"Hello, world!".to_vec()).unwrap().into_raw();

    // Give it to some C function which destructively
    // reads the string, inserting a NUL after "Hello,"
    unsafe { *ptr.offset(6) = 0; }

    // Recover the CString so rust can deallocate it
    let string = unsafe { CString::from_raw(ptr) };
    
    assert_eq!(string.as_bytes(), b"Hello,");

    // !!!!! To my understanding, this invokes UB! !!!!!
    // The allocator will be falsely told that the size is 7.
    drop(string);
}

Am I wrong?

Edit: Fixed the code sample


#2

It’s right there in the docs: (edit: which you did mention)

Additionally, the length of the string will be recalculated from the pointer.

Part of the CString's contract is that there are no \0 bytes except the terminator. Perhaps this hazard should just be added to this method’s # Safety documentation?


#3

I figured the internal representation of CString was maybe something like

struct CString {
    data: Box<CStringInner>,
}

struct CStringInner {
    // Precomputed index of the first NUL byte, which is what
    // the end of the CString will appear to be for functions like as_bytes()
    effective_len: usize,

    // The allocated buffer
    data: [u8],
}

and that “the length” might be referring to something like effective_len there.


Part of the CString’s contract is that there are no \0 bytes except the terminator. Perhaps this hazard should just be added to this method’s # Safety documentation?

This “hazard” makes CString and into_raw practically useless!

It is quite difficult for me to come up with legitimate examples of C functions that:

  • Take a mutable *char
  • Do not have any code paths which may need to write a NUL to an interior byte.

except for I guess something like void convert_ascii_to_uppercase(char *).


#4

If you have functions that may write NUL to the middle of a buffer, then maybe a CString isn’t what you want at all? And instead you want a Vec<u8> ?


#5

I’m saying that virtually every C function I can think of that takes non-const char * probably writes internal NUL bytes, and yet people will use CString::into_raw anyways because it’s the only monomorphic function in all of std that returns a *mut c_char.[^1] That’s seriously not good.


I went through the first page of Github search results for CString::from_raw to test my theory.

I found:

  • Five that use the function correctly, but did not need *mut c_char and could have gotten away with it if CString::into_raw returned *const

    • does it correct but doesn’t need mut (could have used <&CStr>::as_ptr)
    • does it correct but very needlessly
    • does it correct but doesn’t need *mut (could have used <&CStr>::as_ptr)
    • does it correct but doesn’t need *mut (however into_raw was indeed necessary as ownership was temporarily given to C)
    • seems correct (too many usages for me to check them all), but I’m not sure if it needs *mut. It seems to me that all commonmark API functions take const char *.
  • One that does need *mut c_char… but as I predicted, it writes an interior NUL:

  • The other four are either on an entirely different plane of wrongness, or I couldn’t make sense of them.


[^1] Or at least, I think it is. I can’t tell because “as return value” search on rustdoc doesn’t seem to work for pointer types yet