Imagine the following "lib" (I am creating a custom Box
rather than a Vec
for the sake of simplicity):
A custom Box
type
use ::std::{*,
ptr::NonNull,
};
/// Simplified raw allocation interface
pub
trait IsAllocator {
/// Allocates memory for a T
fn alloc<T> () -> NonNull<T>;
/// Frees the pointed memory, but does **not** drop the pointee
unsafe fn free<T> (_: NonNull<T>);
}
pub
enum Allocator {}
impl IsAllocator for Allocator { /* ... */ }
/// Box<T> is simpler than Vec<T> (always len == capacity == 1)
pub
struct Box<T> {
ptr: NonNull<T>, // Since we are the unique owner of T, covariance is fine
}
impl<T> Box<T> {
pub fn new (value: T) -> Self
{
let non_null = Allocator::alloc();
unsafe { ptr::write(non_null.as_ptr(), value); }
Self { ptr: non_null }
}
pub
fn get (&'_ self) -> &'_ T
{
unsafe { self.ptr.as_ref() }
}
pub
fn get_mut (&'_ mut self) -> &'_ mut T
{
unsafe { self.ptr.as_mut() }
}
}
impl<T> Drop for Box<T> { fn drop (&mut self) {
unsafe {
// before freeing the allocated memory, we may need to recursively
// drop the pointee, in case it owns data with drop glue
// (else we would leak memory, with, for instance, a Box<Box<i32>>)
eprintln!("Dropping the Box at {:p}", self);
if mem::needs_drop::<T>() {
eprintln!(
"[Recursively dropping the pointee at {:p}]", self.ptr,
);
ptr::drop_in_place::<T>(self.as_mut());
} else {
eprintln!("[No need to drop the pointee at {:p}]", self.ptr);
}
Allocator::free(self.ptr);
}
}}
Basic usage
Now, imagine it being tested with the following program:
fn main ()
{
let s = String::from("Hello, World!");
let mut p = Box::new(None);
*p.get_mut() = Some(&s);
dbg!(p.get());
} // p is dropped; then s => OK
This compiles and run fine.
More convoluted example
Let's invert the order of the p
and s
declarations:
fn main ()
{
let mut p = Box::new(None);
let s = String::from("Hello, World!");
*p.get_mut() = Some(&s);
dbg!(p.get());
} // s is dropped before p, but &String has no drop glue => OK
This no longer compiles:
error[E0597]: `s` does not live long enough
--> src/main.rs:92:25
|
92 | *p.get_mut() = Some(&s);
| ^^ borrowed value does not live long enough
93 | dbg!(p.get());
94 | } // s is dropped before p, but &String has no drop glue => OK
| -
| |
| `s` dropped here while still borrowed
| borrow might be used here, when `p` is dropped and runs the `Drop` code for type `Box`
|
= note: values in a scope are dropped in the opposite order they are defined
emphasis on:
But in our case the only way we might use (deref) the pointee is when dropping it if it has drop glue, which &String
does not have.
So, when there is no drop glue, the pointer may dangle. Luckily, this can be expressed with the dropck_eyepatch
unstable feature:
unsafe impl<#[may_dangle] T> Drop for Box<T> { fn drop (&mut self) {
/* [the Drop logic has not changed] */
}}
"This compiles and runs fine" ?? Road to UB-land
Now that we have the (unstable
and) unsafe
impl Drop<#[may_dangle] T>
, with nothing else changed, we will find ourselves with the problem that without realizing it, we have told dropck
to trust us too much: when T
has drop glue, we do use the pointer!; and there is actually no place where we have told dropck
that (remember that as far as dropck
is concerned, all the implementations of all functions are loop {}
, i.e., it can obviously not look at the code being actually run; it just interprets function signatures).
Let's abuse it in action, with the following program:
#[repr(transparent)]
#[derive(Debug)]
struct PrintOnDrop<T : Debug> (T);
impl<T : Debug> Drop for PrintOnDrop<T> { fn drop(&mut self) {
eprintln!("Dropping {0:?} at {0:p}", self);
}}
fn main ()
{
let mut p = Box::new(None);
let s = String::from("Hello, World!");
*p.get_mut() = Some(PrintOnDrop(&s));
dbg!(p.get());
} // s is dropped before p, and PrintOnDrop<&String> has drop glue => UAF
This prints the following:
[src/main.rs:105] p.get() = Some(
PrintOnDrop(
"Hello, World!"
)
)
[Dropping the Box at 0x7ffedcb20b20]
[Recursively dropping the pointee at 0x556876d62a40]
Dropping PrintOnDrop("\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}orld!") at 0x556876d62a40
And there it is, a magnificent UAF, because dropck
allows us to have Foo<T>
exit a scope with a dangling T
(such as T = PrintOnDrop<&'dangling String>
) when:
-
unsafe impl<#[may_dangle] T> Drop for Foo<T>
is used, -
struct-wise,
Foo<T>
does not ownT
- when
Foo<T>
owns aT
it is expected that T::drop() be called whenFoo<T>
is dropped (else the implementation would leak memory), thus "ignoring" the<#[may_dangle] T>
attribute whenT
does have drop glue.
- when
That second point is then paramount, and in our Box<T>
case, all we had was a NonNull<T>
pointer which on its own does not own T
(it's just a newtype wrapper around *const T
(with a non-zero lang attached to it)).
What does own T
? Well, for instance, PhantomData<T>
does.
FINAL FIXED VERSION
Hence, we must fix the library code with the following one:
/// Box<T> is simpler than Vec<T> (always len == capacity == 1)
#[allow(non_snake_case)]
pub
struct Box<T> {
ptr: NonNull<T>,
_owns_T: marker::PhantomData<T>,
}
impl<T> Box<T> {
pub fn new (value: T) -> Self
{
let non_null = Allocator::alloc();
unsafe { ptr::write(non_null.as_ptr(), value); }
Self { ptr: non_null, _owns_T: Default::default() }
}
pub
fn get (&'_ self) -> &'_ T
{
unsafe { self.ptr.as_ref() }
}
pub
fn get_mut (&'_ mut self) -> &'_ mut T
{
unsafe { self.ptr.as_mut() }
}
}
unsafe impl<#[may_dangle] T> Drop for Box<T> { fn drop (&mut self) {
unsafe {
// before freeing the allocated memory, we may need to recursively
// drop the pointee, in case it owns data with drop glue
// (else we would leak memory, with, for instance, a Box<Vec<_>>)
eprintln!("Dropping the Box at {:p}", self);
if mem::needs_drop::<T>() {
eprintln!(
"[Recursively dropping the pointee at {:p}]", self.ptr,
);
ptr::drop_in_place::<T>(self.get_mut());
} else {
eprintln!("[No need to drop the pointee at {:p}]", self.ptr);
}
Allocator::free(self.ptr);
}
}}
This no longer compiles, and if we replace PrintOnDrop(&s)
with &s
it does still compile: yay!
error[E0597]: `s` does not live long enough
--> src/main.rs:105:37
|
105 | *p.get_mut() = Some(PrintOnDrop(&s));
| ^^ borrowed value does not live long enough
106 | dbg!(p.get());
107 | } // s is dropped before p, and PrintOnDrop<&String> has drop glue => UAF
| -
| |
| `s` dropped here while still borrowed
| borrow might be used here, when `p` is dropped and runs the `Drop` code for type `Box`
|
= note: values in a scope are dropped in the opposite order they are defined
Conclusion
Using PhantomData<T>
for its dropck
properties for some struct Foo<T>
is only required when there is:
-
unsafe impl<#[may_dangle] T> Drop for Foo<T>
(with the contract that T should then not be accessed except for a "potential" recursiveT::drop()
) -
Foo<T>
does not ownT
struct-wise (for instance, when it only has a non-Unique
pointer toT
like*const T
) and yet it dropsT
- Aside: when, by construction, the pointer is guaranteed to be NonNull, unaliased, and dropping the pointee, then
(NonNull<T>, PhantomData<T>)
is the right way to express it: that is,Unique<T>
.
- Aside: when, by construction, the pointer is guaranteed to be NonNull, unaliased, and dropping the pointee, then