Rustnomincon `Drain` and double dropping

I have been going over the Rustonomicon exercice of implementing a Vec. I'm a bit confused as to why the implementation provided for Drain, as it seems to me that the value will be dropped twice:

pub struct Drain<'a, T: 'a> {
    vec: PhantomData<&'a mut Vec<T>>,
    iter: RawValIter<T>,
}

impl<T> Vec<T> {
    pub fn drain(&mut self) -> Drain<T> {
        let iter = unsafe { RawValIter::new(&self) };

        // this is a mem::forget safety thing. If Drain is forgotten, we just
        // leak the whole Vec's contents. Also we need to do this *eventually*
        // anyway, so why not do it now?
        self.len = 0;

        Drain {
            iter,
            vec: PhantomData,
        }
    }
}

So here, we get an iterator (Drain implements iterator via RawValIter). The value returned by the iterator is read from *const T using ptr::read.
So as far as I understand, if we drop the value returned by the iterator and then drop the Vec, memory will be twice deallocated. That's in theory because running this seems to work fine, but I
can't figure out why. To me this seems similar to the below which does cause memory to be deallocated twice:

fn main() {
    let v: Vec<String> = vec!["0".into(), "1".into(), "2".into(), "3".into()];
    let s = unsafe { v.as_ptr().add(1).read() };
}

Could anyone help out with understanding why the Drain implementation doesn't cause this double drop?

Here is the full code for reference:

note the line with comments about mem::forget() safety inside the Vec::drain() function? it sets len to zero, this semantically transfers the ownership of the Vec's contents to the RawValIter.

now the Vec owns the heap allocated memory, but the RawValIter (virtually) owns the contents which would be ptr::read(). there's no double dropping here:

  • when Drain gets dropped, it explicitly drops the remaining items that had not been ptr::read() out, but it will NOT deallocate the buffer of the Vec.
  • when Vec gets dropped, it deallocates the raw memory of the Vec but will NOT call the destructor of the items;

Thank you, I think the big part which I didn't understand is that I thought a call to drop also dropped the underlying heap memory, when in facts it doesn't.

Last question if I can take some more of your time: imagine you execute the below code (for which I modified Drain, because I was trying to understand why there is this T: 'a requirement).

/// A draining iterator over a [`crate::Vec`].
pub struct Drain<'a, T> {
    // capture the lifetime of the mutable ref to the vector.
    _lt: std::marker::PhantomData<&'a ()>,
    iterator: RawIter<T>,
}

#[derive(Debug)]
struct NotCopy(String);

fn main () {
    let mut vec = Vec::new();
    vec.push(NotCopy(String::from("0")));
    vec.push(NotCopy(String::from("1")));

    let mut drain = vec.drain();
    let val = drain.next().unwrap();
    drop(drain);
    drop(vec);
    dbg!(val);
}

I understand that this should not work, T should only be allowed to live as long as Vec, because dropping Vec means we are deallocating the underlying memory via the drop of RawVec. But running this code works fine, and prints "0". How come I am allowed to still access it?

No, T: 'a means T lives at least as long as 'a. In this case, it means the values are guaranteed to last as long as the vec does or longer, since they're contained in the vec. String happens to be 'static, so it can live as long as you want.

1 Like

because dropping Vec means we are deallocating the underlying memory via the drop of RawVec .

right but isn't this statement still true?

String happens to be 'static , so it can live as long as you want.

But so changing my example to something like the below should also fail no? But this still compiles and runs fine.

pub struct Drain<'a, T: 'a> {
    _lt: std::marker::PhantomData<&'a mut Vec<T>>,
    iter: RawIter<T>,
}

#[derive(Debug)]
struct NotCopy(u64);

fn main () {
    let mut vec = Vec::new();
    vec.push(NotCopy(0));
    vec.push(NotCopy(1));

    let mut drain = vec.drain();
    let val = drain.next().unwrap();
    drop(drain);
    drop(vec);
    dbg!(val);
}

I guess I have two questions: first, why does the compiler allow me to access val, it should be tied to the 'a lifetime, and second, if I remove the T: 'a bound, why is val still allocated, since dropping vec should deallocate it.

u64 is also 'static.

  1. It is tied to it, but in the opposite way. It is required to outlive 'a, not live less than 'a. So when 'a ends by dropping the vec, that's entirely okay! The value wasn't dependent on the vec to begin with.
  2. Lifetimes have nothing to do with allocations, and can never change the behavior of a program: they can only accept or reject valid or invalid code.
2 Likes

ok I think I am starting to understand. So compiler is happy with the code because u64 outlives 'a as it is 'static.

But I guess this still doesn't resolve the ptr::read: NotCopy doesn't implement Copy, meaning reading from the RawIter just returns a bitwise copy of the underlying NotCopy present in the vector, so when the vector gets dropped and heap is deallocated, why can I still access a correct value for val (ref)?

All moving is, fundamentally, is a bitwise copy and invalidating the old place (so it can't be moved again). All Copy does is skip the invalidation step: it's still valid for any type, Copy or not, to be ptr::read under the right circumstances. Since you moved out val, it's no longer in the vec, so deallocating the vec doesn't affect it.

3 Likes

Perhaps the further messages cleared up what I'm about to write in this top section.

But this sounds like you think that the dynamic liveness scope of Vec<T> -- where you drop it -- is directly tied to 'a, due to T: 'a. But Rust lifetimes ('_ things) do not work like that. T: 'a means that the type T contains no lifetimes shorter than 'a, so the type is valid within the lifetime 'a. Note that this a restriction on the type, not on any particular value, and that by "lifetime" here I am not talking about liveness scopes -- where things drop.

@Kyllingene put it well:

The main connection between dropping and something like T: 'a is that T has to be a valid type when it is dropped if it has a non-trivial destructor.[1] Which means that T drops "somewhere in 'a". But it could be anywhere in 'a. Or put another way: the type of a value has to be valid everywhere it is used, and a non-trivial destructor is a kind of use.


Alright, let me try to go back to some actual code to talk about what the lifetime is about. Here's the Drain declaration from the Nomicon:

pub struct Drain<'a, T: 'a> {
    vec: PhantomData<&'a mut Vec<T>>,
    iter: RawValIter<T>,
}

Probably the only reason there's an explicit T: 'a there is that the example was written awhile ago. Today you could write it like this:

pub struct Drain<'a, T> {
    vec: PhantomData<&'a mut Vec<T>>,
    iter: RawValIter<T>,
}

This still has the T: 'a bound, but it's just inferred for you, because the &'a mut Vec<T> that appears in the fields requires it to be true in order to be valid. You use to have to state the bound explicitly.[2]

So that's the surface reason for the lifetime bound on Drain: it's required due to the field type.

The reason that the lifetime is present at all connects back to the only place these types are created:

impl<T> Vec<T> {
    // I added this to better communicate that the `Drain` holds on
    // to an exclusive borrow of `self`
    //                               vv
    pub fn drain(&mut self) -> Drain<'_, T> {

This is required so that the compiler knows to keep the Vec<T> exclusively borrowed so long as the Drain<'_, T> is around. That's what Rust lifetimes are generally about: the duration of borrows.

Now, in this version that you presented, there's no implicit or required T: 'a relationship anymore:

pub struct Drain<'a, T> {
    _lt: std::marker::PhantomData<&'a ()>,
    iterator: RawIter<T>,
}

So within the same module, you could construct a Drain<'static, &'not_static Ty>. But if the only construction site is the drain method, this won't actually happen in practice: that method requires T: 'a implicitly by having a self: &'a mut Vec<T> receiver.

Any place you could construct a Drain<'static, &'non_static Ty> wouldn't be connecting the lifetimes of a borrowed Vec<T> to the Drain, and thus would probably be broken in other ways. But some of the potential breakage just isn't possible with the unique construction site.


But hold on! There's another purpose to PhantomData besides "constraining" a generic parameter you don't have as a field (e.g. adding a lifetime that's not otherwise present). And that is to steer variance and auto-traits like Send and Sync. (In fact being able to decide those things are why you have to constrain generic parameters in the first place.)

Due to the use of *const T in the RawIter<T>,[3] you don't get the auto-traits without an unsafe implementation anyway (like here), so that's not too bad -- you're forced to think about the soundness of the auto-traits somewhere (by luck of having used *const T).[4]

However, your code using PhantomData<&'a ()>, Drain<'_, T> is covariant in T, whereas in the nomicon version, it's invariant in T. The covariance means, for example, you can coerce a Drain<'d, &'long str> to a Drain<'d, &'short str>.

I don't think this is actually exploitable in the provided code: the provided interface only exposes Ts in covariant ways (like handing you the T). And the original Vec<T> is always empty by the time the Drain<'_, T> is constructed, and stays that way. But I also believe that's only by pure luck, and undesirable even if sound.

Let's look at how this could lead to UB given a Drain with a more complete feature set.

Say you could obtain a &mut [T] from Drain as described here, and you also had the keep_rest method. Then you could start with a Vec<&'long str>, get a Drain<'_, &'long str> from the drain method, coerce it to a Drain<'_, &'short str>, obtain a &mut [&'short str], put a short-lived &str into the slice, and then call keep_rest. At that point the Vec<&'long str> becomes usable again, but now it contains a &'short str. This UB already at the language level, and can quickly lead to things like use-after-free.

Here it is mocked up in the playground.

If I'm correct and the covariance is actually sound in the OP due to not having these features, it still means these features could never be added:[5] going from covariant to invariant is a breaking change.


  1. There are some unstable and unsafe ways to loosen this a bit, which the Vec in std actually does use... but the Vec in the nomicon does not, so I'll not complicate things further by trying to explore that wrinkle. ↩︎

  2. Being explicit about it still does have a niche purpose, but only when non-Sized types are also involved, which doesn't apply here. ↩︎

  3. or RawValIter<T> as the nomicon calls it ↩︎

  4. Situations like this were the motivation for raw pointers not implementing the auto-traits, incidentally -- forcing you to think about it. ↩︎

  5. in a SemVer compatible way without a major version bump ↩︎

1 Like

I wouldn’t criticise it quite as heavily, as the standard library seems to be doing the same thing for Vec. Also e.g. for HashMap, in that case it’s documented internally and apparently deliberate.

Interesting. I wonder how aware they are of the implications.

Ah, it was deliberate for Vec:

Probably … not enough? I guess I’ll leave a comment on the tracking issue.

2 Likes

Right but in my example, I am dropping val, which is T, after 'a since I don't have an exclusive borrow on the Vec at that point anymore right? But maybe I am mistakenly liking the T: 'a bound present on the drain to the lifetime of the values T returned by the iterator implementation on the drain? I think that might be it because there is actually no reason in the Rustnomicon implementation for values returned by the Iterator implementation to be 'a.

And that is to steer variance and auto-traits like Send and Sync . (In fact being able to decide those things are why you have to constrain generic parameters in the first place.)

I had never thought about that, it is indeed an interesting usage of the PhantomData, I've always used it for constraining a generic parameter.
Would be interesting to have this stated in the example of the Rustnomicon, really helps understanding why they choose to use &'a mut instead of &'a.

EDIT: thanks a lot to everyone, this has become a lot more clear!

This is partially a rehash of what has been said above, but maybe it helps. The meaning of T: 'a is a minimum requirement, like other people have said. It may be a bit of a red herring if you look at it in isolation.

If we take a couple of steps back, the 'a bound is like a label for the continued borrow of your Vec after the call to drain. It says that

  • &'a or &'a mut references can only exist while the source Vec exists where it is,
  • and the existence of those references means that the source Vec is still considered borrowed and locked in place.

This extends to the existence of Drain<'a, T>, since it's considered to contain &'a mut T references. That means Drain<'a, T> is what has to drop for 'a to end, and 'a has to end before the source Vec drops or moves.

The second ingredient in creating a &'a mut T reference is s valid T value. If T contains a reference in itself, say if we store &'b MyType values, those references must not end before we are done with our &'a mut T references, a.k.a. &'a mut &'b MyType references. If the source of our MyType value would drop too early, we would end up with a dangling reference to where it was.

That is why we write T: 'a. It says that if T is bound by any lifetime, that lifetime has to cover the lifetime 'a as well. In our example, 'b has to be longer than or equal to 'a for you to be able to have Drain<'a, &'b MyType>. It's once again more of a restriction on the existence of Drain<'a, T>, rather than T.

That also means that when you move a value of T out, the lifetime 'a doesn't matter anymore. You own the value now, instead of borrowing it from the source vector. You can keep it for as long as any of its internal reference are valid, if there are any at all.

1 Like