O no, not another lifetime/borrow question (and possible suggestion)

Hi.

I was trying to rebuild some old functions in rust (in particular: error correction using a trie). The algorithm requires keeping track of a set of states: node in the trie, position in the input, and some info on the error. In order to implement it efficiently, I want to allow these states to be able to refer to each other, like this:

struct State<'a> {
    index: u32,
    position: u16,
    distance: u8,
    previous: Option<&'a State<'a>>
}

These States would all live until the end of the function, and the previous pointers form a sort of DAG that allows reconstruction later on. There are other ways to implement it, but they require copying and adding information from the previous to the new State, which I'd like to avoid.

In order to satisfy the lifetime requirements, I thought I could have a Vec (or LinkedList) which contains all States during the function's execution, like this:

let mut states: LinkedList<Box<State>> = LinkedList::new();

I hoped to use states to keep track of all States, and be able to use &State in the other data structures (a queue of unprocessed States, and some sort of set to avoid duplicates). That however requires code to create a new State object that looks like this:

states.push_back(Box::new(State::new(...)));
let new_state = states.back().unwrap()

to which the compiler responds "cannot borrow states as mutable because it is also borrowed as immutable", and when replacing back() with back_mut() that it cannot borrow states as mutable twice.

I think I understand the reason why: allowing this could result in an invalid reference for new_state when that particular element is removed from states. I also think that that explains why e.g. list_tail from LinkedList is a Rawlink<Node<T>>: the compiler doesn't know that the element list_tail points at is kept alive by the fact that link_head points at it indirectly.

So, my question: is that analysis correct, and that does mean I can only do what I planned to do using "unsafe" code, or using Rc<>?

And the suggestion: if that is correct, it could be possible to have a type that guarantees its elements stay alive until it is destroyed itself, e.g. a Vec or LinkedList from which elements cannot be removed (and perhaps not mutated after being added either), and allow some relaxation of the borrowing rules for that type, since it is now possible to infer the lifetime of the element? That might help implementing certain algorithms (like mine) in a more efficient manner.

If I understand correctly you are allocating a lot of States but never dropping them until you're done with all of them. I think this would be a good use of an arena.

1 Like

Can you use a vector of states and just push the new ones in the back? You can use an index number to refer to the previous states (as it is a DAG all the things you want to point to will already be in the array).

I would have the vector owning the state.

You will get much better performance too due to locality and cheap allocation. (Providing the cost of relative addressing the array element does not dominate).

Not that many, in the order of a few hundred would be the normal use, but yes. I'll have a look at the arena (I infer from the linked page that it once existed as a standard type in rust, but has been dropped; any idea why?).

Thanks.

Yes, that's one way out of it. I guess I can allocate one with a sufficient capacity and skip the linked list (which was initially there to avoid problems with reallocation of the vector).

Edit: to clarify: using references would allow me to use the same State object in the other two data structures; when that's not possible, every action copies all members in State, which is more expensive, of course.

Thanks.

No need to allocate in advance, as you can push into a vector, its dynamically sized. (of course pre-allocating to the required size is more efficient). I think this is a better solution than a linked list, as it is more cache friendly.

Why is reallocation of the vector a problem. In general vectors scale with a power of two, so there will be less allocations with a vector than a linked list, and it is number of allocations that affect performance not size.

I actually wanted to test that, as I had read it before.

In my C++ implementation, a singly linked list was really pretty efficient, but I also made use of the fact that that list embedded the element in the list node, and then it could be used as State* throughout, because it will never move (and because I had a special allocator for small objects which was very fast). I was trying to emulate that behavior in rust by avoid the reallocating Vec, but that's clearly not the way to do it. So when I got to that point, I had already planned the idea of replacing LinkedList with Vec to see for myself if there would be a noticeable difference.

I'm going to try the arena, and if that doesn't work, indexes it is...

Thanks for your helpful replies, jethrogb and keean.

I would use a vector in C++ too, and I would use raw pointer to make a linked list. The vector would own the state data, and the links would be non-owning raw pointers. That way the whole list is in a contiguous vector, you can easily extend it, and the whole lot is disposed of when the vector goes out of scope.

It works. Thanks!

It was and still is a type internal to the compiler. On nightly you can use it like so:

#![feature(rustc_private)]

extern crate arena;

use arena::TypedArena;

But of course since this is an internal type this may stop working at any time.

I believe it was also extracted to https://crates.io/crates/typed-arena

That's the link jethrogb provided too. Works very well for this particular problem.

Looking at the source code for the arena crate, it pushes the new object into a vector, and returns a reference to the object in the vector.

I've seen that too. It has just one line of unsafe magic to make that happen. That's quite acceptable, although I must admit I might not understand all the nuances of that line (still learning).

I am not sure why it needs that unsafe code? It may be because the vector is mutable, and therefore the element could be removed from the collection. Does that mean if we had a vector with no delete operation (so you can add elements and mutate them, but not shrink the vector by deleting) that you could define this completely safely? Would the compiler recognise it as safe?

That wouldn't be sufficient, since even when only appending, elements can move when the vector capacity is exhausted and memory has to be reallocated. That's why the arena uses a vector of vectors with preallocated, and constant, capacity.

Of course, its easy to forget that a vector has to copy when it reaches its capacity. What happens if you heap allocate the memory for the object, and store an owning reference in the vector. This is what I would do in C++ using a vector of 'unique_ptr' to the objects. This also allows the objects to be polymorphic (differently sized) and has the same deallocation properties.

Would that be Vec<Box<T>> ?

Yes, that is basically the same as the current Vec<Vec<T>> with all inner Vecs having a capacity of 1.

But there's another essential feature to the arena that a general "append-only vector" wouldn't help with: the alloc method has to be alloc(&self) -> &mut T, otherwise you couldn't allocate more than one object. But this also means that alloc (i.e. push for the append-only vector) has to be the only way to get at the objects stored in it, otherwise you get aliased mutable references - meaning no indexing, no iteration, etc. So in effect, the arena is already the minimal interface and there's no more basic structure beneath.

Funny, that's precisely where I came from...

That is because you have to borrow the whole array mutably to append an item, so you could erase or change elements in the middle of the vector. My point was you can safely borrow the contents of an array/vector cell mutably, providing the only operation available on the vector is append, as you do not need to index or iterate the vector when it is used in an Arena. It's just a shame I can't think of a way to tell the borrow checker that.