Multi keys for same struct. Is HashMap with Rc the best solution?

New to Rust first post here - want to find out if there is an easier way to do the following.

I have a simple struct that contains two fields:

pub struct Student {
    id: u32,
    sec_id: u32,
}

I want to store a lot of these "objects" in a collection that will let me search for a Student by the id or the sec_id.
The best I've found is a HashMap. However, this only allows for one key. Therefore I've resulted in using two HashMaps. One using the id as the key and the second using sec_id as the key.
Since I want both HashMaps to point to the same "object" I had to use Rc and Rc::clone on the Student struct to make this happen.

The complete code:

const STUDENT_NUM: u32 = 5_000_000;

pub struct Student {
    id: u32,
    sec_id: u32,
}

fn main() {
    let mut students_id: HashMap<u32, Rc::<Student>> = HashMap::with_capacity(STUDENT_NUM as usize);
    let mut students_id2: HashMap<u32, Rc::<Student>> = HashMap::with_capacity(STUDENT_NUM as usize);

for x in 0..STUDENT_NUM {
    let student = Rc::new(Student{id: x, sec_id: x + STUDENT_NUM});
    students_id2.insert(x, Rc::clone(&student));
    students_id.insert(x + STUDENT_NUM, student);
}

}

This works fine. However, this incurs some serious memory and performance penalty.
The above code takes twice the memory and 50% more time that is needed if I just duplicated the Student struct completely and inserted it to a whole different HashMap.

My question are:

  1. Why is Rc so memory intensive isn't it suppose to be a simple pointer with a ref counter?
  2. Why it takes so long for the Rc to operate? More than twice the time for using the structs without it.
  3. Is there a better solution to be able to search with two keys for the same object? Am I missing some feature of Rust that can make this simple?

Many thanks for your answers.

Sorry for my bad English and noob question.

I recommend making a vector and storing indexes into that vector in the two hashmaps instead. The reason Rc is expensive is that every Rc is a separate memory allocation.

2 Likes

Hi. Thanks for the reply.

Maybe this is a noob question, but if I remove a Student from a Vector wouldn't the indexes of the remaining Students in the Vector change, making the HashMaps irrelevant?

I should've mentioned that the Students can be added/removed to the collection.

Thanks again.

It seems like you're basically wanting a database table with two indices. You're essentially implementing this using a collection to hold the data (a HashMap<u32, Student>) and an index over sec_id (I'd make it a HashMap<u32, u32> that maps sec_id -> id, though.) You might try using an in-memory database with sqlite or something like that.

1 Like

Thanks for the reply.

The drawback of using a sec_id->id mapping is the need to do two lookups into two HashMaps when I want to do a get by sec_id. A performance penalty I wish I could overcome.

Thanks

It's a fundamental performance penalty that any solution will have, though. You'll always have to do multiple lookups for this kind of problem.

You can use swap_remove and update the index of the single student that got moved.

pub fn remove_sec(&mut self, sec: u32) {
    let idx = match self.by_sec.get(&sec) {
        Some(idx) => *idx,
        None => return,
    };
    self.students.swap_remove(idx);

    let moved = &self.students[idx];
    self.by_id.insert(moved.id, idx);
    self.by_sec.insert(moved.sec_id, idx);
}

playground

2 Likes

To add to the other answers, the reason your memory usage doubles isn't that Rc is expensive, but that Student is extremely cheap. Student is 2 32-bit numbers, so altogether it'll be 64 bits or 8 bytes. On a 64bit machine, that's literally the same size as a pointer. An Rc, being a separate allocation, needs at least 1 pointer and 2 counts (weak count and strong count), so even without any data it's 3 times the size of Student.

If this is your real data struct, and you don't need to mutate it, I would recommend just dropping the Rc and storing copies of the data in each HashMap. It's extremely cheap to copy your data, given that it's only 8 bytes, and if you aren't updating it, I don't see any benefit to using Rc.

With that said, the Vec solution is often a good one, especially if your real data is more complicated than an 8-byte struct.

3 Likes

Have you looked at bimap?

Thanks for the reply.
I don't think that you always have to do multiple lookups.
My code in the original post does only one lookup in one HashMap to find a student (either by id or by sec_id). The only issue is memory consumption and performance of the Rc.

That's an interesting option. I'll have to check its performance in a real life scenario.
I assume it will perform nicely since there shouldn't be many remove operations on the Vector.

Many thanks.

The second lookup in this case comes from the Rc containing a pointer into an arbitrary location in memory and having to fetch that memory location.

4 Likes