Storing data and ref to data in the same struct

I am trying to learn rust by building an app which models a graph of domain concepts. The "nodes" in the concept graph are references to other structs (stored in the same graph). The intent is to process/navigate through a network of references, and then publish results by looking inside the reference into the main concept objects.

I'm getting this cryptic error below. I can make it go away by pulling the declaration of the Container object into run, but I'd like to understand a bit deeper about what rust is complaining about, and what's the idiomatic way to create a concept graph that I am trying to build?

thanks!

#![allow(dead_code)]
#![allow(unused_variables)]
use std::collections::BTreeMap;

#[derive(Debug)]
pub struct Type1 {
    id: u8,
    name: String,
}

#[derive(Debug)]
pub struct Container<'a> {
    t1coll: BTreeMap<u8, Type1>,
    t1refcoll: BTreeMap<u8, &'a Type1>,
}

impl<'a> Container<'a> {
    fn new() -> Self {
        Container {
            t1coll: BTreeMap::new(),
            t1refcoll: BTreeMap::new(),
        }
    }
}

pub fn run(c: &mut Container) {
    c.t1coll.insert(1, Type1{id: 3, name: "t11".to_string(),});
    let rt1: &Type1;

    match c.t1coll.get(&1) {
        Some(t1) => rt1 = t1,
        None => println!("Not found"),
    }
    c.t1refcoll.insert(1, rt1);
}

pub fn main() {
    let mut cont = Container::new();    
    run(&mut cont);
    println!("{:?}", cont);
}

(Playground)

Errors:

   Compiling playground v0.0.1 (/playground)
error[E0623]: lifetime mismatch
  --> src/main.rs:34:27
   |
26 | pub fn run(c: &mut Container) {
   |               --------------
   |               |
   |               these two types are declared with different lifetimes...
...
34 |     c.t1refcoll.insert(1, rt1);
   |                           ^^^ ...but data from `c` flows into `c` here

error: aborting due to previous error

For more information about this error, try `rustc --explain E0623`.
error: could not compile `playground`.

To learn more, run the command again with --verbose.

What you want is essentially a self-referential struct. This can't really be done in Rust without doing some magic like the crate RelPtr. @Yato (the author) can give you a better explanation about it than me, but essentially it takes into account that the memory address of U1 in T is always within a certain delta of another U2 in T (the delta is a function of alignment/size of each U).

RelPtr can't help in this case, but as @nologik said, doing this directly would require self-referential types. Something that Rust doesn't like very much.

There are a few safe ways to get around this, the simplest among them is to stuff things into Rc/Arc.

#[derive(Debug)]
pub struct Container {
    t1coll: BTreeMap<u8, Rc<Type1>>,

    // maybe this could be `Weak<Type1>`, depending on what you need
    t1refcoll: BTreeMap<u8, Rc<Type1>>, 
}

But this solution runs into issues if you also need to mutate Type1. To get around this you can use Rc<RefCell<Type1>> for single threaded applications, or Arc<Mutex<Type1>>/Arc<RwLock<Type1>> for multi-threaded applications (depending on what you need). But this is a huge hassle to use. Usually I try and stay away from this solely because of how much work it is to get this to work.

The second solution is to use an additional "arena"

#[derive(Debug)]
pub struct Container {
    data: Vec<Type1>,

    // these index into `data`
    t1coll: BTreeMap<u8, usize>,
    t1refcoll: BTreeMap<u8, usize>,
}

This is less of a hassle to use, but it has issues of it's own. Namely, you can't remove anything from the list! Oh no! You could fix this by using Vec<Option<Type1>>, but this is also a hassle to use, and it doesn't even prevent the unbounded growth!

You could try and fix the unbounded growth problem by cleverly reusing the memory, but there are some subtle bugs that could crop up. Namely the ABA problem. But thankfully, you don't have to be clever about it, because there is a crate for it generational-arena. This crate fixes the unbounded growth by cleverly reusing memory all without suffering the ABA problem.

use generational_arena::{Arena, Index};
#[derive(Debug)]
pub struct Container {
    data: Arena<Type1>,

    // these index into `data`
    t1coll: BTreeMap<u8, Index>,
    t1refcoll: BTreeMap<u8, Index>,
}

If you don't need an item any more just remove it from the Arena and everything will work out. Once removed, all keys that reference that item can't be used anymore, so you may want to remove those keys from the BTreeMap, but that's not necessary.

edit: removed copy-pasta <'a>

7 Likes

... but I can create the Container struct by moving the allocation of Container to run(), see here. My own understanding maybe at fault here, hence the question.

Also, the type structs are not mutable - at-least that's how I want to model.

arena looks very interesting, let me check it out.

@Yato - thanks! I tried the crate, and it has significantly simplified the design. I like how the compiler almost seems like a laconic design guru.

One further question -- how do I wrap an iterator around Container that will give me predictable order or elements? I can reuse the Arena Iterator, but that seems is not in predictable order.

You can make an immutable iterator rather easily by iterating over both the BTreeMap and using the indicies to access the Arena, the mutable iterator is harder because mutable references are actually exclusive references, which complicates things. I don't think that you can do it in safe code only. A consuming iterator can be made by iterating over the BTreeMap and removing elements from the Arena.

use std::collections::btreemap::{
    Iter as BTreeIter,
    IntoIter as BTreeIntoIter
};

struct Iter<'a> {
    map: BTreeIter<'a, u8, Index>,
    data: &'a Arena<Type1>,
}

struct IntoIter {
    map: BTreeIntoIter<u8, Index>,
    data: Arena<Type1>
}

Now that I take a closer look at your code, I think there is a better way to do this!

Since your indicies are u8, you could just store arrays instead on BTreeMaps. Then use u8 as the indicies. Now, this will suffer the ABA problem, but that may not be an issue for you. This relies on u8 being a small type, so if you aren't using u8 in your actual code, this doesn't apply.

#[derive(Debug)]
pub struct Container {
    // 256 because that way `u8` can always index it
    t1coll: [Option<Type1>; 256],
    t1refcoll: [Option<u8>; 256],
}

Then the iterators become,

type Iter<'a> = std::iter::Flatten<std::slice::Iter<'a, Option<Type1>>>;
type IterMut<'a> = std::iter::Flatten<std::slice::IterMut<'a, Option<Type1>>>;

IntoIter is a bit trickier, but you can implement it yourself with safe code. (by taking things out of the Option using Option::take)

struct IntoIter {
    data: [Option<Type1>; 256],
    index: Option<u8>,
}

If this container is too large, then Box so that it isn't too expensive to move around. (it is at least 768 bytes, which is pretty big to move around a lot), or you could just Box [Option<Type1>; 256], which would make constructing the IntoIter really cheap (but maybe that doesn't matter).

I would need more than a u8 in a real life scaled app.

Ok, I just wanted to note this just in case it worked out :slight_smile: