[Solved] Understanding as_ptr example

Hi,

I am trying to use libc::stat and it requires a CString to be passed as the first parameter here:

pub unsafe extern fn stat(path: *const c_char, buf: *mut stat) -> c_int

If I create the string like so:

unsafe {
    let mut oldfilestat: libc::stat = mem::zeroed();
    let oldfileptr: *const c_char =  CString::new(oldfilepath.to_str().unwrap()).unwrap().as_ptr();
    let ret = libc::stat(oldfileptr, &mut oldfilestat);
    assert_eq!(ret, 0);
}

Where I am using as_ptr() when creating oldfileptr then my assertion fails. I am assuming the ptr (oldfileptr) I am passing in is invalid. However when I simply call as_ptr() when making the call to libc::stat everything works:

unsafe {
    let mut oldfilestat: libc::stat = mem::zeroed();
    let oldfileptr =  CString::new(oldfilepath.to_str().unwrap()).unwrap();
    let ret = libc::stat(oldfileptr.as_ptr(), &mut oldfilestat);
    assert_eq!(ret, 0);
}

So the above code works, where all I changed was calling as_ptr() as part of the statement that makes the function call. Coming from a language like C I'd expect both examples to work no problem, not just the last. I assume there is some important concept specific to Rust about lifetimes or scope that I am missing here. Can someone explain this or point me to the reading?

RAII, it comes from C++.

Hi .. Thanks. I've actually read this before. So of course the as_ptr doesn't actually create or point to something in the heap in the first example. That resource being pointed to is on the stack and not available in the called function, right? Without thinking about it I guess I kind of assume as_ptr() would return me a ptr as I would use in C. I guess I just need to do enough examples like this for things to hit home. It's strange working with C, doing things like this becomes ingrained and second nature it's hard to learn rust. There is no warning or error message because there is no error so it's hard to spot these issues ... just takes practice.

Thanks.

  • CString::new() heap allocates a nul-terminated buffer;

  • the allocation lasts as long as that CString exists, due to RAII design.

This point is paramount, and explains the behavior you observed: when does the created CString get dropped?

To answer this, there are kind of two cases to consider:

  • either the (CString) value is bound to a variable name, and in that case the value is dropped when that binding goes out of scope,

  • or it is not bound to any variable explicitely, in which case Rust will handle the "anonymous binding" by trying to make it last as little as possible. For instance, if it is fed to a function, the value is dropped as soon as the function call returns / evaluates to a value, unless the return value borrows from the input, in which case Rust may figure out it needs to live longer.

In your example .as_ptr() returns a raw pointer, with no lifetime and thus no borrow. So Rust thinks you are done using the CString and drops it, thus leading to you having a dangling pointer.

Thanks ... I was just thinking about this more and typing up another reply to say my explanation is incomplete. I see now that my explanation is not just incomplete but it is wrong. I think I kind of understand now, but probably not fully. Apparently there is some magic in the spec that makes the return value (what would be named oldfileptr) of as_ptr() when called as part of a function parameter last through the function call. But if I do it the statement before it the lifetime of what is pointed gets shortened. In my mind I was expecting the following statements to be equivalent .. expected B from the first example to have the same lifetime as A in the second example.

// First Example
A = B.as_ptr()
mycall(A)

vs

// Second Example
A = B
mycall(A.as_ptr())

But in the case of rust it is not. The reason I was thinking that is because I naturally see the second statement is actually doing the same thing as the first under the covers, i.e.:

A = B
mycall(C = A.as_ptr()) // Where the temp C is passed to mycall

or

A = B
C_temp = A.as_ptr()
mycall(C_temp)

Which would be identical to the first example, but this way of viewing the code is helping to trip me up. I can see what you highlighted is where the answer lies, but I still am not comfortable with understanding how long values last. I guess if A were accepted as a ref, B would then get preserved in the first example?

My operational explanation of this, which is probably too simplistic but generally works, is

  • Rust objects have scope- and usage-based lifetimes which the compiler tracks.

  • Rust references have lifetimes which the compiler tracks and validates. A reference lifetime is no longer than the lifetime of the object to which it refers.

  • Raw pointers designate raw memory rather than Rust objects, no matter how the pointers are derived, so the lifetime of such a pointer is not related to the lifetime of the object from which it is derived.

Locals are dropped at the end of the block they are defined in.

Temporaries are dropped... well, I don't know, actually. I'd like to say "at the end of the (innermost) statement where they are produced," but sometimes the borrow checker complains when you try to chain seemingly innocent method calls on temporaries.
(Edit: I just checked the MIR, and vec![1, 2].len() + 1 does indeed drop the Vec before performing the addition, so it has nothing to do with statements.)

In practice, it seldom matters, because the borrow checker will almost always stop you from doing something bad. CStr::as_ptr is an exception to this since it returns a pointer; the fact that you can call it on a temporary CString is a well-known footgun that is even explicitly detected by clippy.

1 Like

Thanks all. I will look at this some more. Just need to reread and work through more examples involving as_ptr(). @ExpHP Thanks for introducing me to the term footgun. I look forward to using that phrase at some point in the future.

To illustrate what happened in your example:

foo(CString::new(...).as_ptr)

becomes [EDIT: does not become]

foo({
    let temp = CString::new(...);
    temp.as_ptr()
})

And this can be rewritten to the equivalent:

let temp = CString::new(...);
let ptr = temp.as_ptr();
mem::drop(temp);
foo(ptr)

TL,DR: careful getting raw pointers to anonymous RAII elements

This isn't right is it?

foo(CString::new(...).as_ptr)

Is what actually work. Because it doesn't become:

let temp = CString::new(...);
let ptr = temp.as_ptr();
mem::drop(temp);
foo(ptr)

The 2nd example here:

unsafe {
let mut oldfilestat: libc::stat = mem::zeroed();
let oldfileptr = CString::new(oldfilepath.to_str().unwrap()).unwrap();
let ret = libc::stat(oldfileptr.as_ptr(), &mut oldfilestat);
assert_eq!(ret, 0);
}

worked.

You're right, and it turns out I was wrong: I though Rust was more conservative than this.

Here are some playground results:

fn main ()
{
    println!("=== Test 1 ===");
    unsafe {
        print(UAFChecker::new("Hello").as_ptr()); // no UAF
    }

    println!("=== Test 2 ===");
    unsafe {
        let ptr = UAFChecker::new("Hello").as_ptr();
        print(ptr); // UAF
    }

    println!("=== Test 3 ===");
    unsafe {
        let s = UAFChecker::new("Hello");
        print(s.as_ptr()); // no UAF
    }
}

I had expected the first test to also fail, but I think the real desugaring of the anonymous binding (in that case) is:

foo(CString::new("Hello").as_ptr())

// becomes
{
    let temp = CString::new("Hello");
    let ret = foo(temp.as_ptr()); // temp not dropped yet
    ret
}
  • which is weird, since temp lives longer than it needs to (remember, since there are no borrows here, as far as Rust is concerned, foo is not using temp)

In any case, the reason why the second test fails is that in that case the desugaring is:

let ptr = UAFChecker::new("Hello").as_ptr();
foo(ptr); // UAF

// becomes
let ptr = {
    let temp = CString::new("Hello");
    temp.as_ptr()
}; // temp dropped
foo(ptr); // UAF

Proof

The real way to know what Rust does it to have a look at the MIR:

Let's inspect the following function:

unsafe fn foo (input_vec: Vec<u8>)
{
    let ptr = CString::from_vec_unchecked(input_vec).as_ptr();
    // let drop_ret_val: () =
    bar(ptr);
}

Here is the generated MIR (with panic = "abort", and having manually renamed the variables myself):

1 Like

What tool did you use to generate the MIR.

1 Like

Haven't tried it yet, but this may be a clue: Visualizing your rust code using graphviz | Jonathan Steyfkens

1 Like

When I've made MIR graphs in the past, I just used --unpretty=mir to print the MIR, deleted a bunch of cruft and adjusted the names, and then manually wrote a dot file, converting line breaks to \l inside labels:

digraph graphname {
    graph [fontname = "courier", splines="ortho"];
    node [fontname = "courier", shape="box"];
    edge [fontname = "courier"];

    bb0 [label="bb0:\n_d = discriminant(((*_a).0));\lswitchInt(move _d) -> [\l    1isize: bb1,\l    otherwise: bb2,\l];\l"]
    bb0 -> bb1;
    bb0 -> bb2;

    bb1 [label="bb1:\nStorageLive(_b);\l_b = &mut ((((*_a).0) as Some).0);\l_a = &mut (*(*_b));\lgoto -> bb2;\l"];
    bb1 -> bb2;

    bb2 [label="bb2:\nStorageDead(_b);\lStorageLive(_n);\ldiscriminant(_n) = 0;\ldrop(((*_a).0));\l((*_a).0) = move _n;\lStorageDead(_n);\lreturn;\l"]
}

...and then I used GraphViz dot -Tsvg to make an SVG, and post-processed it in Inkscape to add boldface and color to some words.

3 Likes

cargo rustc -- --unpretty=mir-cfg with graphviz / dot, after renaming the variables (they are automatically named _0, _1, etc. which is not very readable)

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.