Drop values as soon as possible

Hey,

I'm writing a program that operates on some very big graphs, that in total don't fit into my computers memory. I process the graph step by step, and sometimes copy it into a new one.
A simple test showed, that rust in release mode only drops values when they go out of scope, in my case at the end of the function.

Is there a good reason for keeping them so long, or is this just for simplicity? They could also be dropped right after their last usage in the function.
I know that I can use drop to free the memory taken by a graph early, but that's kinda tedious and error prone.

Would there be any downsides to releasing resources that will never be used again immediately after their last usage?

For data types that implement Drop, dropping early could introduce bugs. It is not uncommon to have a "guard" type that holds a lock, for instance, and there is no way for the compiler to know what specifically the lock is protecting.

That said, it would be great if there were a way for the compiler to know that it could drop objects early because there are no side effects to doing so.

2 Likes

Would you mind giving an example of code where dropping earlier would introduce bugs that would not be caught by the compiler?

Specifically, my current mental model is that dropping earlier would be equivalent to adding a call to std::mem::drop at the earliest point where it does not introduce a compile error due to the value being moved before a later use, and I can't think of any case where that would then cause a bug. All the cases I can come up with dereference the lock guard or similar to access the locked data, and thus don't have an issue.

Looks like something similar to NLL, but for the values and not the references.

2 Likes

This was called “early drop”, and we didn’t implement it because of worries about unsafe code. Yes, the compiler could tell for safe code, and it would be fine, but unsafe code cannot, by definition, be checked.

2 Likes

I was not thinking about unsafe code when I wrote the following, so it may be moot, but I believe it would be very difficult and of questionable utility to do this even for safe code.

There's currently no way for the compiler to know that. The borrow checker runs under the assumption that values are dropped at the end of their scope, in reverse creation order -- it uses that assumption to determine whether all the lifetimes are valid and the compiler can proceed with codegen. All that lifetime information is then thrown out.

So you're talking about major modifications to borrowck, most likely, because you have to back out a fundamental assumption (values are dropped at the end of scope in reverse creation order) and invert it somehow to answer the question "When can each value be dropped to make the lifetimes valid?" I'm only passingly familiar with the compiler internals, but that's bound to be a lot of work.

And that's assuming it can be done, and it can be done in a way that is predictable and reliable -- because if you can't depend on the compiler to move your drops to the place where you want them, you're back to creating scopes and/or calling drop by hand.

There is code in the wild of the form:

struct Foo {
    lock: Mutex<()>,
}

fn do_something(&self) {
    let _g = self.lock.lock().unwrap();
    // now guaranteed you're holding the lock until scope ends (assuming no manual `drop()` below)
}

FFI or not, the compiler cannot reason about the validity of destroying _g early because that would mean it's essentially equivalent to:

let _ = self.lock.lock().unwrap();
// nothing is locked at all here!
// this is actually a semi-common footgun for newcomers

since _g isn't used for anything in the first example - it's just a lock-scope-in-a-binding, so to speak.

So although it's true that a lot of times you have a Mutex<T> where T is the actual thing you want to manipulate, and do that via the guard, but it's not always the case - sometimes you have Mutex<()> just to represent/work with critical sections.

7 Likes

Note that, with NLL having landed in Rust 1.31, values are dropped based on the control flow graph, not on their lexical scope.

You're right that it's a lot of work! It took almost two years to ship. But it did ship now. :slight_smile:

Isn't this slightly incorrect?

IIRC, items with non-trivial drop glue (those that implement Drop or a member implements Drop, transitively) still live to the end of scope.

NLL regions (non-lexical lifetimes) allows borrow lifetimes to expire before the end of a lexical block if they are no longer used, but this only applies to trivially droppable types; types with non-trivial drop code have an implicit use at the end of the scope to drop them in reverse declaration order.

Example on playground

Yes, sorry, this doesn't apply to Drop types. That's also due to compatibility/usefulness concerns though, rather than to the difficulty of the analysis.

1 Like

I'll just add another example where early drop changes the behavior, although @vitalyd answered the question pretty well. It's important to remember that correctness is not all about safety. It's also important that existing code continue to do the same thing.

Imagine a pretty printer that closes braces by returning a type that prints the closing brace.

struct Close(&'static str);
impl Drop for Close {
  fun drop(&mut self) { println!("{}", self.0); }
}

It's an idiosyncratic way to close braces, but even idiosyncratic code should continue to give the same output when upgrading to a new rust. And I could imagine there are scenarios where using drop like this could significantly clean up code. In a sense it's a way to make the compiler guarantee that you never forget the closing brace, which is actually pretty powerful.

4 Likes

There's also code in the wild that looks like:

struct AbortOnPanic;
impl Drop for AbortOnPanic {
    fn drop(&mut self) {
        if std::thread::panicking() { panic!("panicking while panicking = abort"); }
    }
}

(should this be added to std::thread::panicking's docs?)

I didn't mean "Rust doesn't do this because it's hard to do," although it may have sounded like I was saying that. I meant, subjectively, that it's a lot more work than @farnz's statement suggests (even beyond what has already been done for NLL), and the payoff may not be good enough to justify that work even if there is no other reason not to do it.

I still think that querying for "what's the earliest point you could drop this value before causing an error" implies a major structural change to the compiler, even with NLL, and I'd be interested to find out if I'm wrong. But the points others have made make it more or less irrelevant to this thread.

1 Like

To be clear, that was a pure gedankenexperiment, and not something I think the compiler could implement; I was interested in cases where implementing that would cause previously working code to break. Were it to be worth implementing, I'd expect us to build atop the NLL machinery to find the last use of a value and insert the drop immediately after that.

@vitalyd, @Soni and @steveklabnik have more than answered my question with examples I hadn't considered, though, showing that it's unsound to make this change without some serious planning (you'd need something to mark values that needed a later drop, for a start, and you'd want an edition change at a minimum so that cargo fix can insert that for every type that implements drop).