How to efficiently drop a linked-list-like structure based on `Rc`, without triggering stack overflow aborts?


#1

[Small side-note section describing the context; skip to the next paragraph for the actual issue.]

Yesterday I stumbled upon an “horrible” bug in my code that triggered stack overflows, and no matter what I tried to pin-point it made it seem that the issue lied elsewhere than where I was looking for it… Moreover it seemed that the stack overflow happen after any piece of code…

It turns out that the issue was caused by a recursive drop call in a linked-list-like structure based on Rc.


The problem

Given a structure similar to the following:

struct Node <V : Default> (Rc<NodeData<V>>);

struct NodeData <V : Default> {
    value : V,
    next : Option<Node<V>>,
}

The default ops::Drop implementation would recurse through the linked structure, thus eventually triggering an stack overflow abort. (This happens for lists longer than ~40k elements.)

Therefore one should implement an alternative ops::Drop that would elide this issue.

The question

How would one efficiently implement an “iterative drop”, that eluded the recursive calls?


The proposed solution

My take on this is given in the following Rust playgroud snippet:

https://play.rust-lang.org/?gist=f9c325c8f09724c1485998779ec6413f&version=stable&mode=debug

(One can use DROP_ITERATIVE to enable or disable the functionality.)

Which features the following ops::Drop implementation:

impl <V : Default> ops::Drop for Node<V> {
    fn drop (&mut self) -> () {
        loop {
            // We get `Some(data)` if the RC's strong count is exactly 1, thus it will be dropped.
            let next = if let Some (data) = Rc::get_mut (&mut self.0) {
                // We try to extract the next node...
                // In doying so we need to swap the node internal data with some "dummy" one.
                let mut swap = NodeData { value : Default::default (), next : None };
                mem::swap (&mut swap, data);
                if let Some (next) = swap.next {
                    next
                } else {
                    return;
                }
            } else {
                // The underlying RC is used elsewhere, thus it won't be dropped.
                return;
            };
            // Behind the following assignment the following happens:
            // (1) `drop` will be called once more for the current value;
            // (2) `self` will be replaced with `next`, thus "eating" from the list one more step;
            *self = next;
        }
    }
}

The issue with the proposed solution

Unfortunately it seems that my implementation is not quite that efficient…

Trying to time the drop in the playground example yields comparable times for with and without the iterative drop, thus it suggests the implementation is quite efficient.

However when I plug it into my actual code (a Scheme interpreter, where the linked-list-like Node is actually part of a generic Value object) the efficiency drops almost 10 times…

The following are the links to my actual implementation:


Other alternative implementations?

Has anyone stumbled upon a similar issue before? Has anyone seen alternative implementations?

Someone on IRC pointed me to Rust’s LinkedList which implements such an iterative drop technique, however as LinkedList doesn’t use Rc, but instead manual memory management, it doesn’t seem to be directly applicable to the usecase of Rc-based linked structure.


Thanks for the help,
Ciprian.


#2

EDIT: Just double checking, but you did compile in release mode, didn’t you? Compiling in debug mode is slow as mentioned and takes 447ms, but using release mode drops 400,000 in 14.735ms.

For perspective, the naive implementation mentioned below (playground) consistently takes 23ms to drop a 400,000 element list.

Ignore the original comment, it sounds like the issue is not compiling in release mode.


I imagine this problem isn’t unique to Rust, seeing as linked lists are used all over the place in computer science. Normally you’d use a Box instead of Rc because an Rc introduces the possibility of reference cycles, which is a massive pain to work around.

The typical way you delete a list is by continuously popping items off the top until there’s nothing left. I’m not a fan of the Drop impl you posted earlier where it requires the type to implement Default. Instantiating a new V and swapping it in feels kinda weird.

Also, why is the Rc wrapping a NodeData when really it’s the node contents (V) we want to share? If you only need a singly-linked list, I’d write it somewhat like this:

struct Node<T> {
  item: Rc<T>,
  next: Option<Box<Node<T>>>,
}

struct List<T> {
  head: Node<T>,
}

Then you can implement pop() as

impl<T> List<T> {
  pub fn pop(&mut self) -> Option<T> {
    let Node { item, next } = self.head.take()?;
    self.head = next;
    Some(item)
  }
}

And dropping a List<T> is just a case of popping until you’re empty.

impl<T> Drop<T> {
  fn drop(&mut self) {
    while !self.is_empty() {
      let _ = self.pop();
    }
  }
}

I think the biggest difference is I’ve introduced a List<T> and made Node<T> an internal implementation detail. If a Node<T> knows how to drop itself and the things it points to you’ll have the recursion issue on long lists.


#3

@Michael-F-Bryan thanks for the feedback. I’ll try to address your three points:


(A) I did try to compile and run the code both in debug and release mode, and as I stated for the “simple” Node implementation above, it seems that the iterative drop algorithm doesn’t have a large overhead.

However when I applied this technique in my “real” code the performance dropped considerably, mainly because in my “real” code the next field is an enum, of which only one arm is a similar node.

Thus my question was more to the approach itself, given the structure.

(And yes, this issue is not particular to Rust itself, but to any reference-counted “smart pointer”-based system.)

As a side-note: my assumption is that the default ops::Drop implementation generated by Rust is overly optimized (or simplified), thus the performance advantage.


(B) Indeed my Node structure doesn’t seem to be a properly implemented and it should in fact use a Option<Box<Node>> as its next field.

However the Node structure I presented is an oversimplified cons cell from a Scheme-based interpreter which has the following simplified value semantic:

  • there are only “objects”;
  • some “objects” are “atomic” (like numbers, booleans, etc.);
  • there is a cons cell (i.e. pair) that has two fields: car and cdr (i.e. left and right), that can hold any “object”;
  • there is a special constant object null (or nil);
  • a “proper list” is any cons cell that has as its right field another “proper list” or the constant null; (the left field can be any object.)

Therefore as a design decision any object that is not atomic (i.e. it doesn’t implement Copy) it is in fact an Rc<InternalData>.

Thus you can see why my node is actually an Rc<NodeData>.


I “required” V : Default, mainly so I can use the Default::default() function in swapping the NodeData with a dummy one. In my “real” code, as seen above, I replace the left and right values with an “atomic” object like the above mentioned null.

The “weird” Rc::get_mut() and then mem::swap() calls are mainly because I know no other way to “move” the two left and right references out of the Rc itself.

The solution proposed by you is indeed efficient, but it relies on Box which does have an unbox mechanism. The same can’t be said about the Rc wrapper.


Thanks again for the feedback,
Ciprian.


#4

Well, Rc does have try_unwrap. A lot of what your code is trying to do work around not using this method.


#5

@stevenblenkinsop Indeed Rc does have Rc::try_unwrap(), however that function “consumes” the Rc. Meanwhile if the structure is Node (Rc<NodeData>), and given the ops::Drop signature of fn drop (&mut self), one can’t “move” the rc from self, it can only get a &mut Rc, therefore my “contraption” of mem::swap() the internal data.


#6

If you Option::take the next field, then the head is unlinked and you own the tail, so you can unbox tail nodes as needed for Drop optimization.

Which brings us back to the choice of how you structured the list, which you’ve already discussed as being inspired by how other languages represent objects. Ultimately, you end up with the same data structure, it’s just where you put the abstraction boundaries can affect how easy it is to work with. If you did something similar to the Box list above but with Rc at the Node boundaries instead, you end up with:

If you actually want an Rc linked list rather than Rc data, it might be more usual to structure it as follows. It’s essentially the same data structure, just with the abstraction boundaries moved. This might make the behaviour of Clone as a refcount bump more obvious:

struct List<T>(Option<Rc<Node<T>>>);

struct Node<T> {
    value: T,
    next: Option<Rc<Node<T>>>,
}

This way, you can take the Rc. You need optionality in there somewhere, might as well put it where you can use it.


#7

@stevenblenkinsop Refactoring the structure (at least in the way you’ve proposed) is not an option, because as said, the simplified Node structure I presented, is used to sometimes build a linked list structure, but – in the “real” code – it can hold arbitrary “objects” in either the value or the next fields.


For example I included the following Rust playground example which is still a simplified version of my “object” hierarchy, but one that highlights the “recursion” possibilities:

https://play.rust-lang.org/?gist=384f547c5ce1b2d9458fe082ca0fee63&version=stable&mode=debug

Just the structures:

enum Value {
    Number (i64),
    Cons (Cons),
    Null,
}

struct Cons (Rc<(Value, Value)>);

Moreover, due to efficiency the following must hold true: mem::size_of::<Value> <= 16 (on 64 bit architectures). This is so that using these “values” is still safe from Rust’s point of view, and wastes just twice than an actual pointer.


It seems I could use instead a representation of Cons as:

struct Cons (Option<Rc<(Value, Value)>>);

Which seems to keep the size of Value at still 16 bits, however it “breaks” the semantic of a cons cell, by becoming a “corner-case” that is used only for ops::Drop implementation…


#8

Yeah, I was trying to edit my comment away from dealing with the structural side of things, but you’d already seen it. My initial response was intended more generally, and I got caught not having dug into the weeds of what you were looking to do :wink:.

Really what I was getting at is that you don’t need to construct a dummy Node, you only need a dummy tail. But since you’re using essentially (Null, Null) as your dummy Node, it probably doesn’t make much of a difference.

Anyways, this is more what I had in mind (playground):

impl Value {
    // Takes ownership of the `Value`, leaving `Value::Null` in its place.
    fn take(&mut self) -> Value {
        std::mem::replace(self, Value::Null)
    }
}

// Note: `Drop` is implemented on the internal representation so we don't have
// to reach through an `Rc` to something that might not even be being dropped.
impl Drop for ConsInternals {
    fn drop(&mut self) {
        let mut tail = self.right.take();
        while let Value::Cons(tail_cons) = tail {
            if let Ok(mut tail_internals) = Rc::try_unwrap(tail_cons.0) {
                tail = tail_internals.right.take();
            } else {
                break
            }
        }
    }
}

Of course, this doesn’t help completely if someone is building trees instead of lists.


#9

@stevenblenkinsop No problem for the misunderstanding.

Regarding your proposed implementation of ops::Drop on the ConstInternals itself, this is something I was pondering yesterday myself. It makes more sense on implementing ops::Drop directly on this structure rather than it’s Cons (Rc<ConsInternals>) wrapper, thus one can elide the case where the outer Rc still has references.

Regarding your observation that it doesn’t handle trees you are right… But I think that issue would be better served when I implement a proper GC system for my interpreter.

I’m curios if this change when pluged in my “real” code will remove that awful overhead.


#10

One can hope, but it’s hard to see how without identifying the cause of the slowdown. If it’s just the fact that you have to call functions at all, then there’s not much you can do. A profiler might help.

You could do something like this in the meantime (playground):

impl Value {
    // Takes ownership of the `Value`, leaving `Value::Null` in its place.
    fn take(&mut self) -> Value {
        std::mem::replace(self, Value::Null)
    }

    // Returns the `ConsInternals` of the `Value` if it is a `Value::Cons` that
    // is uniquely owned. Returns the original `Value` as an error otherwise.
    fn try_unwrap_cons(self) -> Result<ConsInternals, Value> {
        match self {
            Value::Cons(cons) => Rc::try_unwrap(cons.0).map_err(|rc| Value::Cons(Cons(rc))),
            value => Err(value),
        }
    }

    // Returns a mutable reference to the `ConsInternals` of the `Value` if it
    // is a `Value::Cons` that is uniquely owned. Otherwise, returns `None`.
    fn get_cons_mut(&mut self) -> Option<&mut ConsInternals> {
        match *self {
            Value::Cons(ref mut cons) => Rc::get_mut(&mut cons.0),
            _ => None,
        }
    }
}

impl Drop for ConsInternals {
    // Drops a `Cons` list by repeatedly unlinking the head from the tail.
    // Handles tree-like structures by chaining together any lists encountered
    // in the `left` branch. The right-most leaf of the chain of left-values is
    // replaced with each left-value encountered in the original `Cons` list
    // until that list is empty, and then this chain of left-values is unlinked
    // in the same way as the original `Cons` list.
    // 
    //      A        tail  lefts    tail  lefts    tail    tail  lefts    tail
    //    /   \  ->  B—C   E—F   -> C     E—D   -> E—D  -> D     G     -> G
    //   E    B      |     |              |        |     
    //  / \  / \     D     G              G        G
    // G  F  D  C 
    //
    fn drop(&mut self) {
        let mut tail = self.right.take();
        let mut left_chain = self.left.take();
        while let Value::Cons(_) = tail {
            {
                let mut left_chain_foot = &mut left_chain;

                while let Ok(mut tail_internals) = tail.try_unwrap_cons() {
                    tail = tail_internals.right.take();

                    // `get_cons_mut` currently has to be called twice to 
                    // satisfy the borrow checker.
                    while let Some(_) = left_chain_foot.get_cons_mut() {
                        left_chain_foot = &mut moved(left_chain_foot).get_cons_mut().unwrap().right;
                    }

                    *left_chain_foot = tail_internals.left.take();
                }
            }

            tail = left_chain;
            left_chain = Value::Null;
        }
    }
}

// Helper to move `&mut` references rather than reborrowing them. Can also be
// spelled `{ value }`.
fn moved<T>(value: T) -> T {
    value
}