O no, not another lifetime/borrow question (and possible suggestion)

TGV · May 8, 2016, 8:09am

Hi.

I was trying to rebuild some old functions in rust (in particular: error correction using a trie). The algorithm requires keeping track of a set of states: node in the trie, position in the input, and some info on the error. In order to implement it efficiently, I want to allow these states to be able to refer to each other, like this:

struct State<'a> {
    index: u32,
    position: u16,
    distance: u8,
    previous: Option<&'a State<'a>>
}

These States would all live until the end of the function, and the previous pointers form a sort of DAG that allows reconstruction later on. There are other ways to implement it, but they require copying and adding information from the previous to the new State, which I'd like to avoid.

In order to satisfy the lifetime requirements, I thought I could have a Vec (or LinkedList) which contains all States during the function's execution, like this:

let mut states: LinkedList<Box<State>> = LinkedList::new();

I hoped to use states to keep track of all States, and be able to use &State in the other data structures (a queue of unprocessed States, and some sort of set to avoid duplicates). That however requires code to create a new State object that looks like this:

states.push_back(Box::new(State::new(...)));
let new_state = states.back().unwrap()

to which the compiler responds "cannot borrow states as mutable because it is also borrowed as immutable", and when replacing back() with back_mut() that it cannot borrow states as mutable twice.

I think I understand the reason why: allowing this could result in an invalid reference for new_state when that particular element is removed from states. I also think that that explains why e.g. list_tail from LinkedList is a Rawlink<Node<T>>: the compiler doesn't know that the element list_tail points at is kept alive by the fact that link_head points at it indirectly.

So, my question: is that analysis correct, and that does mean I can only do what I planned to do using "unsafe" code, or using Rc<>?

And the suggestion: if that is correct, it could be possible to have a type that guarantees its elements stay alive until it is destroyed itself, e.g. a Vec or LinkedList from which elements cannot be removed (and perhaps not mutated after being added either), and allow some relaxation of the borrowing rules for that type, since it is now possible to infer the lifetime of the element? That might help implementing certain algorithms (like mine) in a more efficient manner.

jethrogb · May 8, 2016, 8:27am

If I understand correctly you are allocating a lot of States but never dropping them until you're done with all of them. I think this would be a good use of an arena.

keean · May 8, 2016, 8:41am

Can you use a vector of states and just push the new ones in the back? You can use an index number to refer to the previous states (as it is a DAG all the things you want to point to will already be in the array).

I would have the vector owning the state.

You will get much better performance too due to locality and cheap allocation. (Providing the cost of relative addressing the array element does not dominate).

TGV · May 8, 2016, 8:55am

Not that many, in the order of a few hundred would be the normal use, but yes. I'll have a look at the arena (I infer from the linked page that it once existed as a standard type in rust, but has been dropped; any idea why?).

Thanks.

TGV · May 8, 2016, 8:58am

Yes, that's one way out of it. I guess I can allocate one with a sufficient capacity and skip the linked list (which was initially there to avoid problems with reallocation of the vector).

Edit: to clarify: using references would allow me to use the same State object in the other two data structures; when that's not possible, every action copies all members in State, which is more expensive, of course.

Thanks.

keean · May 8, 2016, 9:20am

No need to allocate in advance, as you can push into a vector, its dynamically sized. (of course pre-allocating to the required size is more efficient). I think this is a better solution than a linked list, as it is more cache friendly.

Why is reallocation of the vector a problem. In general vectors scale with a power of two, so there will be less allocations with a vector than a linked list, and it is number of allocations that affect performance not size.

TGV · May 8, 2016, 9:29am

I actually wanted to test that, as I had read it before.

In my C++ implementation, a singly linked list was really pretty efficient, but I also made use of the fact that that list embedded the element in the list node, and then it could be used as State* throughout, because it will never move (and because I had a special allocator for small objects which was very fast). I was trying to emulate that behavior in rust by avoid the reallocating Vec, but that's clearly not the way to do it. So when I got to that point, I had already planned the idea of replacing LinkedList with Vec to see for myself if there would be a noticeable difference.

I'm going to try the arena, and if that doesn't work, indexes it is...

Thanks for your helpful replies, jethrogb and keean.

keean · May 8, 2016, 9:45am

I would use a vector in C++ too, and I would use raw pointer to make a linked list. The vector would own the state data, and the links would be non-owning raw pointers. That way the whole list is in a contiguous vector, you can easily extend it, and the whole lot is disposed of when the vector goes out of scope.

TGV · May 8, 2016, 10:05am

It works. Thanks!

jethrogb · May 8, 2016, 6:25pm

It was and still is a type internal to the compiler. On nightly you can use it like so:

#![feature(rustc_private)]

extern crate arena;

use arena::TypedArena;

But of course since this is an internal type this may stop working at any time.

steveklabnik · May 8, 2016, 7:03pm

I believe it was also extracted to https://crates.io/crates/typed-arena

TGV · May 8, 2016, 9:15pm

That's the link jethrogb provided too. Works very well for this particular problem.

keean · May 8, 2016, 9:42pm

Looking at the source code for the arena crate, it pushes the new object into a vector, and returns a reference to the object in the vector.

TGV · May 9, 2016, 5:32am

I've seen that too. It has just one line of unsafe magic to make that happen. That's quite acceptable, although I must admit I might not understand all the nuances of that line (still learning).

keean · May 9, 2016, 10:24am

I am not sure why it needs that unsafe code? It may be because the vector is mutable, and therefore the element could be removed from the collection. Does that mean if we had a vector with no delete operation (so you can add elements and mutate them, but not shrink the vector by deleting) that you could define this completely safely? Would the compiler recognise it as safe?

birkenfeld · May 9, 2016, 10:48am

That wouldn't be sufficient, since even when only appending, elements can move when the vector capacity is exhausted and memory has to be reallocated. That's why the arena uses a vector of vectors with preallocated, and constant, capacity.

keean · May 9, 2016, 10:51am

Of course, its easy to forget that a vector has to copy when it reaches its capacity. What happens if you heap allocate the memory for the object, and store an owning reference in the vector. This is what I would do in C++ using a vector of 'unique_ptr' to the objects. This also allows the objects to be polymorphic (differently sized) and has the same deallocation properties.

Would that be Vec<Box<T>> ?

birkenfeld · May 9, 2016, 11:17am

Yes, that is basically the same as the current Vec<Vec<T>> with all inner Vecs having a capacity of 1.

But there's another essential feature to the arena that a general "append-only vector" wouldn't help with: the alloc method has to be alloc(&self) -> &mut T, otherwise you couldn't allocate more than one object. But this also means that alloc (i.e. push for the append-only vector) has to be the only way to get at the objects stored in it, otherwise you get aliased mutable references - meaning no indexing, no iteration, etc. So in effect, the arena is already the minimal interface and there's no more basic structure beneath.

TGV · May 9, 2016, 11:30am

Funny, that's precisely where I came from...

keean · May 9, 2016, 11:41am

That is because you have to borrow the whole array mutably to append an item, so you could erase or change elements in the middle of the vector. My point was you can safely borrow the contents of an array/vector cell mutably, providing the only operation available on the vector is append, as you do not need to index or iterate the vector when it is used in an Arena. It's just a shame I can't think of a way to tell the borrow checker that.

Topic		Replies	Views
Yet another “cannot borrow `*self` as mutable because it is also borrowed as immutable” help	11	676	April 2, 2023
Rust Non-lexical lifetimes confusion help	6	404	March 21, 2022
Help with multiple mutable reference borrow because of lifetimes help	7	268	August 12, 2023
Lifetimes: storing a Vec of data and a Vec of references to that data side-by-side not working	8	669	July 9, 2022
Help with lifetimes and references	3	376	May 21, 2021

O no, not another lifetime/borrow question (and possible suggestion)

Related Topics