`String::with_capacity(MAX_PATH).as_mut_ptr()` can lead to undefined behavior

The following Rust code:

const MAX_PATH: usize = 260;
fn main() {
    let str1 = String::with_capacity(MAX_PATH).as_mut_ptr();
    let str2 = String::from("TotototototototototototototototototoT").as_mut_ptr();
    unsafe {
        std::ptr::copy_nonoverlapping(str2, str1, 30);
        println!("{:?}", String::from_raw_parts(str1,30,30));
    }
}

uses the String::with_capacity function to create a string with a capacity of 260 characters. It then uses the as_mut_ptr method to get a raw pointer to the string.

However, this approach can lead to undefined behavior because:

The String::with_capacity function only guarantees to allocate enough memory to store the specified number of characters. It does not guarantee that the allocated memory is valid.
The as_mut_ptr method returns a raw pointer to the internal data of the string. This pointer may point to uninitialized memory or memory that has been invalidated by other operations.
Therefore, using the String::with_capacity(MAX_PATH).as_mut_ptr() method to create a raw pointer can lead to the following problems
-Program crashes
-Data corruption
-Security vulnerabilities

Expected Behavior
The String::with_capacity function should not allow the creation of a raw pointer to uninitialized memory.

Similar to String::from("x").as_mut_ptr(), such raw pointers should not be compiled successfully.

I see you opened an issue, nice find!

Using unsafe incorrectly and got UB, huh.

On a more serious note:

The whole point of with_capacity is creating uninitialized memory. But following code

let str1 = String::with_capacity(MAX_PATH).as_mut_ptr();`

Definitely does not do what you think it do. It creates a String, get its internal pointer, then the String is freed, then your pointer is dangling. (and your str2 is dangling too.)

Uninitialized memory is not the problem, dangling pointer is the problem.

I think you confused as_mut_ptr with into_raw_parts. And you can't just use String::from_raw_parts with wrong capacity parameters in.

10 Likes

My thought is whether we can fundamentally eliminate the occurrence of dangling pointers right from the source.

In general, we can't, unless we remove as{_mut}_ptr completely. Once pointer is created, it's up to caller to ensure that owner of the corresponding data is not dropped.

3 Likes

Raw pointers are designed so they can be obtained in safe Rust, but can only be dereferenced in unsafe Rust, since dereferencing them is what is unsafe.

2 Likes

This is almost identical to a common footgun in CString. They even call it out in the docs:

WARNING

The returned pointer is read-only; writing to it (including passing it to C code that writes to it) causes undefined behavior.

It is your responsibility to make sure that the underlying memory is not freed too early. For example, the following code will cause undefined behavior when ptr is used inside the unsafe block:

use std::ffi::CString;

// Do not do this:
let ptr = CString::new("Hello").expect("CString::new failed").as_ptr();
unsafe {
    // `ptr` is dangling
    *ptr;
}

This happens because the pointer returned by as_ptr does not carry any lifetime information and the CString is deallocated immediately after the CString::new("Hello").expect("CString::new failed").as_ptr() expression is evaluated. To fix the problem, bind the CString to a local variable:

use std::ffi::CString;

let hello = CString::new("Hello").expect("CString::new failed");
let ptr = hello.as_ptr();
unsafe {
    // `ptr` is valid because `hello` is in scope
    *ptr;
}

This way, the lifetime of the CString in hello encompasses the lifetime of ptr and the unsafe block.

I'm surprised there isn't already a Clippy lint checking for calling as_ptr() or as_mut_ptr() on a temporary variable, though. That pattern is almost always guaranteed to give you a dangling pointer.

2 Likes

Sure. Stop using raw pointers and use references instead. That's why they exist.

The whole point of raw pointers is to be an escape hatch for things the compiler can't check, where you're required to understand and follow the soundness requirements.

6 Likes

The original code uses libloading to load C++ DLLs and call their methods. This inevitably involves the use of raw pointers.

There is the original code. rust and cpp

use libc::{size_t, wchar_t};
use widestring::WideString;

const UTILITIES: &str = "utilities.dll";
const MAX_PATH: usize = 260;

pub fn test() -> Result<size_t, Box<dyn std::error::Error>> {
    let mut dll_path = std::env::current_dir().unwrap();
    dll_path.push(UTILITIES);
    unsafe {
        let lib = libloading::Library::new(dll_path)?;
        let get_company: libloading::Symbol<
            extern "C" fn(*const wchar_t, size_t, *mut wchar_t, size_t) -> size_t,
        > = lib.get(b"get_company")?;


        let filename = WideString::from("hahah");
        let file_size = filename.len();
        let file_ptr = filename.as_ptr();

        // let mut content: [u16; MAX_PATH] = [0; MAX_PATH];
        // let company = &mut content as *mut _ as *mut wchar_t;
        // let mut company_ws = WideString::with_capacity(MAX_PATH);
        // let mut company = company_ws.as_mut_ptr();

        let mut company = WideString::with_capacity(MAX_PATH).as_mut_ptr();
        let return_size = get_company(file_ptr, file_size, company, MAX_PATH);

        println!("company: {:?}", WideString::from_ptr(company,MAX_PATH));
        println!("return_size: {}", return_size);

        Ok(return_size)
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    std::env::set_current_dir(std::env::current_exe()?.parent().unwrap())?;
    // let binding = WideString::from("D:\\workspace\\rust_load\\filemaster.exe");
    // let name = get_company_sign(binding)?;
    // println!("name: {:?}, size: {}", name, name.len());
    println!("running");
    test()?;
    println!("after call test");
    Ok(())
}

UTILITIES_API size_t get_company(const wchar_t* name, size_t name_size, wchar_t* company_name, size_t company_size)
{

	std::wstring file_name(name, name_size);
	std::shared_ptr<SignatureInfo> p_sign = std::make_shared<SignatureInfo>();

	signature_tool::get_inst()->GetSignatureInfo(file_name, *p_sign);

	memset(company_name, 0, company_size * sizeof(wchar_t));

	wmemcpy_s(company_name, company_size, p_sign->m_strSignCompany.c_str(),
		(p_sign->m_strSignCompany.size()));

	std::wstring real_company(company_name);
	return real_company.size();
}

Then no, Rust can't "fundamentally eliminate" this. You have to do it right, like in C.

Reminder that copy_from_slice exists, which is safe, and using that will help avoid these problems.

5 Likes

Thanks.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.