Arena based Linked List Manager - need help with returning references

I have a particular algorithm which I want to implement using doubly linked lists. I need to do this because the algorithm requires some linked list operations like (i) deleting an element inside the list or (ii) moving an element from one linked list to another.

I thought that the best way to do this may be to allocate an arena which holds all the elements, and has all the forward and backward links. And then, a linked list would constitute of a reference to an arena, and a pointer to the head of the specific linked list. The arenas would be implemented using a vector.

However, I am not able to figure out the correct lifetime parameter tricks to make the get function below to typecheck. Any help is appreciated. Also, if the code can be more idiomatic and less awkward somehow, please suggest how I can do so.

use std::cell::RefCell;
use std::rc::Rc;

struct VllPool<T> {
    mem : Vec<T>,
    freehead : Option<usize>,
    next : Vec<Option<usize>>,
    prev : Vec<Option<usize>>,
}

struct Vll<T> {
    pool : Rc<RefCell<VllPool<T>>>,
    head : Option<usize>,
}

struct VllPointer<T> {
    ptr : usize,
    vll : Rc<RefCell<Vll<T>>>,
}

impl<T> VllPool<T>{
    fn new(size : usize) -> VllPool<T> {
        assert!(size > 0);
        let mut mem : Vec<T> = Vec::with_capacity(size);
        unsafe {
            mem.set_len(size);
        }
        let freehead : Option<usize> = Some(0);
        // next = [Some(1), Some(2), ..., Some(size-1), None]
        let mut next : Vec<Option<usize>> = (0..size).map(|i| Some(i + 1)).collect();
        next.push(None);
        let prev : Vec<Option<usize>> = vec![None; size];
        VllPool { mem, freehead, next, prev }
    }
    fn new_vll(self_ : Rc<RefCell<VllPool<T>>>) -> Vll<T> {
        Vll { pool : self_.clone(), head : None }
    }
    fn has_space(&self) -> bool {
        self.freehead.is_some()
    }
    // creates a new node with content x
    // which is positioned after after, and before before,
    // i.e, after -> new -> before
    fn insert_new(&mut self, x : T, after : Option<usize>, before : Option<usize>) -> usize {
        assert!(self.has_space());
        let tmp = self.freehead.unwrap();
        self.freehead = self.next[tmp];
        self.next[tmp] = before;
        self.prev[tmp] = after;
        if let Some(b) = before {
            self.prev[b] = Some(tmp);
        }
        if let Some(a) = after {
            self.next[a] = Some(tmp);
        }
        self.mem[tmp] = x;
        tmp
    }
    // deletes the node at position p
    // p is the new head of the free list
    // if we previously had after -> p -> before
    // now we have after -> before
    fn delete(&mut self, p : usize) {
        let after = self.prev[p];
        let before = self.next[p];
        if let Some(b) = before {
            self.prev[b] = after;
        }
        if let Some(a) = after {
            self.next[a] = before;
        }
        self.next[p] = self.freehead;
        self.freehead = Some(p);
    }
    // moves p to a new singleton list
    fn move_to_singleton(&mut self, p : usize){
        // previously we had after -> p -> before
        // now we have after -> before
        let after = self.prev[p];
        let before = self.next[p];
        if let Some(b) = before {
            self.prev[b] = after;
        }
        if let Some(a) = after {
            self.next[a] = before;
        }
        // now we have p -> None
        self.next[p] = None;
        self.prev[p] = None;
    }
    fn get<'b>(&'b self, p : usize) -> &'b T {
        &self.mem[p]
    }
}

impl<T> Vll<T> {
    fn is_empty(&self) -> bool {
        self.head.is_none()
    }
    fn push(self_ : Rc<RefCell<Vll<T>>>, x : T) -> VllPointer<T> {
        let mut self1 = self_.borrow_mut();
        let tmp = {
            let mut pool = self1.pool.borrow_mut();
            pool.insert_new(x, None, self1.head)
        };
        self1.head = Some(tmp);
        VllPointer { ptr : tmp, vll : self_.clone() }
    }
    fn delete(p : VllPointer<T>) {
        let mut self1 = p.vll.borrow_mut();
        let headnext = self1.pool.borrow().next[p.ptr];
        if self1.head == Some(p.ptr) {
            self1.head = headnext;
        }
        let mut pool = self1.pool.borrow_mut();
        pool.delete(p.ptr);
    }
    // move current node to a new list
    fn move_to_singleton(p : VllPointer<T>) -> Vll<T>{
        let self1 = p.vll.borrow();
        let mut pool = self1.pool.borrow_mut();
        pool.move_to_singleton(p.ptr);
        Vll { pool : self1.pool.clone(), head : Some(p.ptr) }
    }
    // fn get(&self, p : VllPointer<T>) -> &T {
    //     let pool = self.pool.borrow();
    //     &pool.get(p.ptr)
    // }
}

fn main(){
    let pool : VllPool<u64> = VllPool::new(10);
    let poolref = Rc::new(RefCell::new(pool));
    let vll1 = VllPool::new_vll(poolref.clone());
    let vll2 = VllPool::new_vll(poolref.clone());
    let vll1ref = Rc::new(RefCell::new(vll1));
    let vll2ref = Rc::new(RefCell::new(vll2));

    let p11 = Vll::push(vll1ref.clone(), 1);
    let p12 = Vll::push(vll1ref.clone(), 2);

    let p21 = Vll::push(vll2ref.clone(), 1);
    let p22 = Vll::push(vll2ref.clone(), 2);
    let p23 = Vll::push(vll2ref.clone(), 3);

    Vll::delete(p21);
    Vll::delete(p11);

   let vll3 = Vll::move_to_singleton(p22);
}

To be clear, Vll (vector linked list) is the linked list type, and it is meant to be accessed via the VllPointer type

There's no valid reasonable way to implement get because nothing can prevent someone else from calling .borrow_mut() on the pool field of the Vll, thus getting a mutable reference to it while a shared one exists as the value returned by get.

So how should I proceed? What is the best way to represent this data structure I am interested in, in Rust?

The shallow answer is that your have to return the read guard Ref:

    // use std::cell::Ref;
    fn get(&self, p : VllPointer<T>) -> Ref<'_, T> {
        let pool = self.pool.borrow();
        Ref::map(pool, |pool| pool.get(p.ptr))
    }

...but I didn't analyze anything else, like if this is going to make you always panic in practice or something. Playground.

2 Likes

Fantastic! That compiles, and I seem to be able to run get and print the contents.

But I have more questions now. What is Ref and what is the '_ parameter? and what does Ref::map do?

RefCell tracks reference lifetimes at run-time, and panics if you violate the rules (instead of causing potential memory unsafety)

In order to do that it has to be able to know when references are still live. So borrow returns the Ref guard type which allows you to access the reference. When the Ref guard is dropped, the RefCell updates the lifetime information it's tracking.

'_ is a reserved lifetime name that's part of the lifetime elision feature. It means roughly "the default lifetime inferred for this function call". You could rewrite the signature of get without lifetime elision as

fn get<'a>(&'a self, p : VllPointer<T>) -> Ref<'a, T>

And it would mean the same thing, i.e. that the lifetime Ref contains is tied to the lifetime of the borrow of self

1 Like

Small note, and hopefully not derailing, but this code is not ok:

You need to either use MaybeUninit<T> as the items, or initialize the values (using vec.as_mut_ptr() and ptr.write()) before calling set_len(). Or just use push()!

7 Likes

I cannot initialize these values because I do not have Default value for an arbitrary type T. I could use more complex annotations, but this seems fine to me for the following reason:

When vec.with_capacity is called, wouldn't enough memory be allocated so that .set_len is safe? I do not intend to retrieve the elements of the array before they are actually initialized, via Vll::push.

Thanks! So, this lifetime annotation is no different from the annotation in VllPool::get.

I am still not sure what Ref::map does, though. Is this like a promise that promises to build this reference in the future, but not now?

Yes.

so that .set_len is safe?

No, because that's necessary but not sufficient. Consult the safety documentation:

Safety

  • new_len must be less than or equal to capacity().
  • The elements at old_len..new_len must be initialized.

You have satisfied the first requirement but not the second.

I saw that, but I have to wonder: what could go wrong?

Presumably, the manual says that because they are assuming that the elements of the vector may be accessed and that might result in getting some garbage. But I am going to not do that.

Well, for one thing, when you execute

self.mem[tmp] = x;

the first thing the assignment does is drop the old value of self.mem[tmp]. That's uninitialized memory being passed to Drop::drop, if T implements Drop. Definitely UB and likely to have immediately detected consequences, e.g. if T is String.

But that's just a concrete example that I was able to find. You should not assume that code is sound just because nobody is coming up with a plausible explanation of what could go wrong. That's not how Undefined Behavior works. Rather, the compiler and standard library are free to optimize your program into another program that is equivalent to the original under the assumption that the UB case never executes (which here means "new() is never called") — because that is the key to very many optimizations — and thus the behavior if the UB case does execute could be anything. Violating a safety requirement is like writing a logical contradiction — everything is possible afterward.

5 Likes

If T was String, and you never actually filled the array, you have random garbage wherever the uninitialized String values were. Sure rust probably memset everything to zero before-hand; but I don't like assuming that (malloc v.s. calloc)..

But now consider

struct FileHandle { fh: i32 }
impl Drop for FileHandle {
  fn drop(&mut self) { if self.fh > -1 { myfile::close(self.fh) }}
}
impl Default for FileHandle {
   fn default() -> Self { Self { fh: -1 } } 
}
impl FileHandle {
  fn new(fh: i32) -> Self { Self { fh } }
}

She's not going to be happy when you drop all the elements of the array. Yeah it's a crappy struct, but it's possible.

replace mem.set_len(size) with

       let mut mem: Vec<T> = vec![ T::default() ; size ];

sample code

  fn foo<T:Default+Clone>(size: usize) -> Vec<T> {
        let mut mem: Vec<T> = vec![ T::default() ; size ];
        mem
    }
    #[test]
    fn test1() {
        let v:Vec<String> = foo(17);
        println!("{v:?}");
    }

Note I needed Default and Clone to make the macro work. That might not work for you of course. Others' suggested MaybeUninit, but I'm not as familiar with that one.

Minor optimization - just because it always bugs me. Instead of pre-filling all the empty nodes (as this scales horribly), add a "top" usize.
On allocation

  fn alloc_slot(&mut self) -> Option<usize> {
      if self.top < self.mem.len() {
           let val = self.top;
           self.top += 1;
           Some(val)
      } else if !self.free_list.is_empty() {
          self.free_list.pop()
     } else {
          let val = self.top;
          self.mem.push(T::default());
          self.top += 1;
          Some(val)
     }
  }

Okay, thanks. I see the point you are making.

On a different note, is this the best way to implement this linked list thing? I tried using it to actually store things, and I am finding myself drowning in a lot of complex code

So I was intreaged. I'm not at all sure why you need a linked list; maybe you need to maintain a traversal order or something.. But I implemented a COMPLETELY USELESS implementation that lets me learn and understand a lot of wierd quirks in Rust.. OMG this was complex.

The things I learned:

  • (the wrapping / filtering Ref) that was a definitely cool new thing for me. Thanks @quinedot .
  • MaybeUninit. It makes a lot of sense to me now. Just massively unsafe.
  • How to filter code to conditionally enable a trait (like Default).. It was NOT obvious to me all the ways it can break, nor how the compiler rules could keep it straight.. What it seems to me is that you can't have any ambiguous mappings; so isolating to distinct structs (optionally in separate modules for clarity)
  • modules using public peer modules (got lots of compiler errors when I use super::Foo until I set it to pub use super::Foo
  • defaulted Trait implementations
  • ins and out of implementing an epoc-based-arena (though this isn't thread safe)
  • PhantomData
  • Inherited / extended traits

[Rust Playground](https://rust playground)

1 Like

Wow, that is a huge amount of code.

I need linked lists to implement an algorithm from 1987. This is the partition refinement algorithm by Tarjan and Paige. PDF. See section 3.

Here is a brief description of the data structures

This is a description of the main loop of the algorithm.

Rust does already have LinkedList in std::collections - Rust - but I would be very surprised if you actually saw a performance improvement from using it over a Vec or either of the Sets.

There's very little call for linked lists in practice - O(1) means very little if that 1 is extremely big (millions of cycles asymptomatically is pretty common).

Also I'm surprised the thread hasn't already linked Introduction - Learning Rust With Entirely Too Many Linked Lists

2 Likes

code is so much easier to read then math English.. sigh.