How to structure self-referencing constant data?

I have a struct which contains some values in a hash, and another hash that references those values. something like:

struct PieceDictionary<'a> {
    pieces: HashMap<PieceType, PiecePrototype>,
    promotion_index: HashMap<PieceType, &'a PiecePrototype>,
}

Now, I know some things the compiler doesn't:

  1. the values in pieces are only inserted once in the constructor and never removed.
  2. values in promotion_index are also only inserted once (in the constructor) and only refer to items in pieces.

The consequence is that I know values always live longer than their reference.

since the compiler doesn't know that, I actually can't compile this scheme. If the constructor contains a line like

instance.promotion_index.insert(plain_id, &instance.pieces.get(&adv_id));

(where plain_id and adv_id are the PieceType keys)
Then it's a borrow of instance.pieces that continues to live through the function, and instance can't be returned. It seems that the compiler identifies a situation where instance.promotion_index contains references that theoretically can become dangling (although I know they won't).

The ways around it I found:

  1. instead of holding plain values and references in the index, hold everything as Rc. This makes the compiler happy, at the cost of some reference counting during struct construction.
  2. I can create a new reference ro the result of instance.pieces.getby dereferencing a raw pointer to it, which requires unsafe block, but represents my knowledge that the compiler doesn't have.

I'm hoping for a 3, some kind of lifetime annotations or a another method to help me give my knowledge to the compiler, so it can do what it usually considers unsafe, without the overhead of Rc.

What would you do in this situation?

The main issue with option 2 is that if you were to move your PieceDictionary struct, all of the references would become immediately become dangling.

This is the main reason why self-referential structs are so difficult in Rust - it creates a kind of chicken-and-egg problem. If you move the struct first, all the references get invalidated, but you can't update the references first, because they'd have nothing to point to!

I believe (please correct me if I'm wrong) to implement this safely, you would need to enforce the invariant that PieceDictionary can never be moved - this can be achieved using the new-ish Pin type.

That said, this makes things a ton more complicated, and 99% of the time, you're better off just using Rc, or restructuring your program in such a way that you no longer need self-borrows.

1 Like

You could store the PiecePrototype in a Vec and have the HashMap store indexes into the Vec

struct PieceDictionary {
    piece_data: Vec<PiecePrototype>,
    pieces: HashMap<PieceType, usize>,
    promotion_index: HashMap<PieceType, usize>,
}

This will likely have the least overhead, and it allows efficient iteration over all the pieces.

PiecePrototype can't move, and to do anything useful with Pin you must use unsafe. So it's best to avoid it until you are squeezing out the last drops of performance.

6 Likes

Yeah, Pin is an extremely complicated API to use correctly (I definitely wouldn't trust myself to use it right now :sweat_smile:), and it's not something you should reach for unless you're 100% you need it. I mentioned it for the sake of completeness, but I'd recommend doing something like @RustyYato's suggestion.

Using array/vec indices as a substitute for pointers/references is a pretty common pattern in Rust, especially when creating data structures that don't lend themselves well to borrow checking. The consequences for misusing an index (getting the wrong item, or an out of bounds panic) are much, much lower than the consequences for misusing a pointer (segfaults, if you're lucky)!

Petgraph is a good example of a library that uses this strategy.

I've used this method before. however, it's actually a parallel, unchecked system of reference. In a way, it's like using unsafe without having to do it.

But use of indices is not susceptible to UB and thus LLVM over-optimization, unlike unsafe use of references where subtle mistakes can lead to UB and thus compiler output that is potentially garbage.

1 Like

I agree that it's less in danger in that regard, and Rust does bounds checking on vector access IIRC so it's safer than raw pointers. It has its own problems, but maybe it really is the lesser evil.

Using indices are very likely to be the easiest solution. If you promise to never move or change the hash map, you could make something that should work using raw pointers, but I'm not convinced it's worth the effort.

It doesn't seem like you would need Pin as long as you guarantee the contents of the hash maps are never changed as well.

I agree that the vec/index option is probably best. But another option if this is a create once/rarely data type would be to use internment which could leak memory for you so that you could use pure pointer accesses and avoid bounds checks. For that matter, you could just leak memory on your own and use &'static, if you aren't creating many of these. Internment would limit your leakage to unique data items. And it would speed up your hash maps and comparisons by making pointer equality the same as value equality.

2 Likes

Another option is to use a arena, like typed-arena, but this will require you to store the arena separately from the PieceDictionary.

use typed_arena::Arena;

struct PieceDictionary<'arena> {
    piece_data: &'arena Arena<PiecePrototype>,
    pieces: HashMap<PieceType, &'arena PiecePrototype>,
    promotion_index: HashMap<PieceType, &'arena PiecePrototype>,
}
3 Likes

You don't need an arena if you are storing the pieces map somewhere else, right?

#[derive(PartialEq, Eq, Hash)]
struct PieceType;
struct PiecePrototype;

struct PieceDictionary<'a> {
    pieces: &'a HashMap<PieceType, PiecePrototype>,
    promotion_index: HashMap<PieceType, &'a PiecePrototype>,
}

fn main() {
    let mut pieces = HashMap::new();
    pieces.insert(PieceType, PiecePrototype);
    let mut dict = PieceDictionary {
        pieces: &pieces,
        promotion_index: HashMap::new(),
    };
    dict.promotion_index
        .insert(PieceType, dict.pieces.get(&PieceType).unwrap());
}

I assume that you want to add pieces in the PieceDictionary, if not then yes, you can do away with the arena.

1 Like

See, here's the problem: in this way, an item of promotion_index is guarantied to live as long as the pieces hash, but not as long as one particular entry of that hash. That's what trips the problem. Maybe your example doesn't show it because all the action is in main(). If you try that in a function (say the constructor) and then try to return dict, I think you'll get the same error I got.

If I understand correctly, @RustyYato's arena proposition gets around it by making sure that individual entries get the outer lifetime, rather than the dictionary but not the entries.

Using an arena you still need a separate variable to live longer than dict. It's just that then that variable is the arena and not pieces.

1 Like

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.