What unsoundness can be caused by destructor without PhantomData?

The old Rustonomicon says:

Another important example is Vec, which is (approximately) defined as follows:

struct Vec<T> {
    data: *const T, // *const for variance!
    len: usize,
    cap: usize,
}

Unlike the previous example, it appears that everything is exactly as we want. Every generic argument to Vec shows up in at least one field. Good to go!

Nope.

The drop checker will generously determine that Vec does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create > unsoundness using Vec's destructor.

In order to tell the drop checker that we do own values of type T, and therefore may drop some T's when we drop, we must add an extra PhantomData saying exactly that:

use std::marker;

struct Vec<T> {
    data: *const T, // *const for variance!
    len: usize,
    cap: usize,
    _owns_T: marker::PhantomData<T>,
}

While, what is the unsoundness caused because of the absence of PhantomData?

I have seen the document also says:

But ever since RFC 1238, this is no longer true nor necessary.

But the core question remains... Anyway, data is *const T, nothing can be done outside of unsafe block, if someone does change data in an unsafe block, seems PhantomData can do nothing to prevent it from being compiled.

1 Like

I think in past, the drop check was only done when you "owned" T (which you can declare with PhantomData<T>). This is also what the Nomicon still says in the Table of PhantomData patterns.

But AFAIK, now this holds generally, as explained on the Drop Check page in the Nomicon:

For a generic type to soundly implement drop, its generics arguments must strictly outlive it.

So it's no longer needed, I guess? But not sure if I'm correct.

1 Like

You can read this prior thread.

In summary:

  • It used to work in the dangerous way
  • It hasn't since RFC 1238 in most cases, but
    • It's still true if you use the unstable #[may_dangle] (RFC 1327)
    • And probably it will stay that way, but
    • The RFCs left it open to revisit and that documentation update is rather new...
      • ...and the nomicon is explicitly non-normative, and the change was accepted with zero input (much less an FCP) from the Rust teams, so :man_shrugging:

Although unnecessary today, it doesn't hurt, so I still say throw it in your struct if the old wording applies to you.

(But what I really want is an official statement on the matter confirming the updated Nomicon. And what I really really want is for Rust to have a specification.)


So what's the unsafe part? Before the change, a destructor of a type with a generic type parameter was allowed to run after the type was invalid (i.e. when the generic type held a lifetime, the destructor could run after the lifetime "expired"). There were guard-rails: If the Drop implementation had a trait bound on the generic parameter, and the trait bound exposed any method, the loosening of the rules was not allowed (the outer type had to outlive it's lifetime-carrying type parameter; the outer type was considered to own some instance of the type parameter; the drop had to run before the lifetime became invalid).

Unfortunately, the guard-rails were not enough. They are also not sufficient in the face of specialization, should that ever stabilize. So currently, the presence of the type parameter signals ownership... unless you use #[may_dangle].

It's still important in that case as #[may_dangle] doesn't even have the not-fully-effective guard-rails.

And now the only way you can drop after your lifetimes expire is if you use #[may_dangle] to signal that you promise not to look at what you're dropping. And if what you're dropping doesn't have drops of it's own, that's great! But if it does, you need the PhantomData to signal that you "own" some of the generic parameter -- when you drop, some of the parameter will be dropped too. This allows dropck to correctly determine when it's sound for your type to drop or not.

A full example that was linked to in the other thread.

9 Likes

@zylthinking1 doesn't the current documentation of PhantomData in the nomicon answer your question? Which part wasn't clear?

This used to be a recurring question given the out-of-date information in the nomicon, as @quinedot's curated list of threads and explanations illustrates, but I'd think that nowadays it shouldn't be the case anymore.

Indeed, the new documentation says (emphasis added):

I updated said docs precisely to clarify these things, so if something isn't clear then further improvements may be needed :slightly_smiling_face:

2 Likes

Sorry, It is my fault. I printed the book into PDF and read it in an eink device.
Turns out the

part didn't not expand in the PDF, only Click here to see why showed, which I didn't notice something related to was absent neither.

The

fn main() {
    let mut v: Vec<&str> = Vec::new();
    let s: String = "Short-lived".into();
    v.push(&s);
    drop(s);
} 

is just the exact example I want to find, showing how the unsoundness occurs.

I didn't realize the T in Vec could be some reference, so confused that Vec owns T, so T will always alive in any touchable code of Vec, including in drop.

1 Like

The old text confuses me a bit.

I can understand why compiler determine that Vec<T> does not own any values of type T Without PhantomData; but why make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor?

  • if T is primary type such as i32, pointer...etc, no dropping T is needed; the dropping thing is about the type of T, not about the PhantomData.

  • Vec holds some pointer, which can be invalid freely. So, the compiler surely should worry about the de-referencing thing at any tme, not only in Vec::drop.

Currently in my understanding, It seems to say Vec can't determine the safety of de-referencing the pointer. So, a rule is made out, saying that:

  • if a PhantomData exists, then take it as a flag of dangerous, disallow any code potentially harmful to be compiled;

  • without PhantomData, take it as safe, compiler will allow code like

    fn main() {
        let mut v: Vec<&str> = Vec::new();
        let s: String = "Short-lived".into();
        v.push(&s);
        drop(s);
    } 
    

passes compilation, if it can't be dereferenced in fact, an UB occurs.

Because the book only says it does not need PhantomData field explicitly today, so the explanation why PhantomData is needed in the old text is not out of date.

I do want to know whether my current understand is right or not.

yeah, the <details> section has this accessibility problem :thinking:

No, it's easy to dismiss accessibility problems as one's own "oversight" but I think this is a genuine problem of the documentation; I'll thus try to change that <details> when I have the time

1 Like

Maybe the document should not focus on drop, which gives me the impression that PhantomData is strongly related to drop, which in fact is not.

Like the following code, failing to compile while no drop has been implemented for vvec

use std::marker;

struct vvec<T> {
    data: *const T, // *const for variance!
    len: usize,
    _owns_T: marker::PhantomData<T>,
}

impl<T> vvec<T> {
    fn new() -> Self {
        vvec {
            data: 0 as *const T,
            len: 1,
            _owns_T: PhantomData::<T>,
        }
    }

    fn push(&mut self, x: T) {}
    
    fn hello(&self) {}
}


fn main() {
    let mut v: vvec<&str> = vvec::new();
    {
        {
            let s: String = "Short-lived".into();
            v.push(&s);
        }
        v.hello();
    }
}

Wait... code without _owns_T fails to compiled neither, then what is the effect of PhantomData?

use std::marker;

struct vvec<T> {
    data: *const T, // *const for variance!
    len: usize,
    //_owns_T: marker::PhantomData<T>,
}

impl<T> vvec<T> {
    fn new() -> Self {
        vvec {
            data: 0 as *const T,
            len: 1,
           // _owns_T: PhantomData::<T>,
        }
    }

    fn push(&mut self, x: T) {}
    
    fn hello(&self) {}
}


fn main() {
    let mut v: vvec<&str> = vvec::new();
    {
        {
            let s: String = "Short-lived".into();
            v.push(&s);
        }
        v.hello();
    }
}

Depending on how you interpret things, you're either

  • creating a vvec<&'long> (due to the .hello()) and trying to borrow the 'short-lived s for 'long, or
  • creating a vvec<&'short str> (due to pushing &'short s) and then trying to use it for 'long.

Either way it's a borrow check borrow violation.

The borrow check considerations under discussion are for Drop::drop specifically. Normal uses aren't effected. Moreover, without a Drop implementation, you have no #[may_dangle], so you get owning semantics today.

Ok, I would like to rollback to some old rust edition to make the following code compiled, wrongly, which must use PhantomData to fail the compilation.

use std::marker;

struct vvec<T> {
    data: *const T, // *const for variance!
    len: usize,
    // _owns_t: marker::PhantomData<T>,
}

impl<T> vvec<T> {
    fn new() -> Self {
        vvec {
            data: 0 as *const T,
            len: 1,
            // _owns_t: marker::PhantomData::<T>,
        }
    }

    fn push(&mut self, _x: T) {}
}

impl<T> Drop for vvec<T> {
    fn drop(&mut self) {}
}

fn main() {
    let mut v: vvec<&str> = vvec::new();
    {
        {
            let s: String = "Short-lived".into();
            v.push(&s);
        }
    }
}

I have tried 1.48, 1.40. Both refuse to compile.

According to the document:

Another important example is Vec, which is (approximately) defined as follows:

struct Vec<T> {
    data: *const T, // *const for variance!
    len: usize,
    cap: usize,
}

Unlike the previous example, it appears that everything is exactly as we want. Every generic argument to Vec shows up in at least one field. Good to go!

Nope.

The drop checker will generously determine that Vec does not own any values of type T. This will in turn make it conclude that it doesn't need to worry about Vec dropping any T's in its destructor for determining drop check soundness. This will in turn allow people to create > unsoundness using Vec's destructor.

In order to tell the drop checker that we do own values of type T, and therefore may drop some T's when we drop, we must add an extra PhantomData saying exactly that:

use std::marker;

struct Vec<T> {
    data: *const T, // *const for variance!
    len: usize,
    cap: usize,
    _owns_T: marker::PhantomData<T>,
}

There should be a version that compiler will pass the compilation, but I can't find such a version. 1.40 is a very old version, which I am sure the new behavior of
But ever since RFC 1238, this is no longer true nor necessary. did not apply yet.

Or, There is some misunderstanding, maybe

fn main() {
    let mut v: vvec<&str> = vvec::new();
    {
        {
            let s: String = "Short-lived".into();
            v.push(&s);
        }
    }
}

is not the correct example to show how unsoundness occurs without PhantomData

RFC 1238 is older.

Here's a case, I believe. Inspired by #29106. Uncomment the PhantomData or move up a Rust version and compilation fails.

1 Like

Yes, that is what I am looking for.
I didn't realize the behavior had changed since version 1.5.0. After all, I read the explanation about PhantomData in the age of rust 2018.

It took me more than 2 years to find the final example; Thank you

2 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.