New to Rust first post here - want to find out if there is an easier way to do the following.
I have a simple struct that contains two fields:
pub struct Student {
id: u32,
sec_id: u32,
}
I want to store a lot of these "objects" in a collection that will let me search for a Student by the id or the sec_id.
The best I've found is a HashMap. However, this only allows for one key. Therefore I've resulted in using two HashMaps. One using the id as the key and the second using sec_id as the key.
Since I want both HashMaps to point to the same "object" I had to use Rc and Rc::clone on the Student struct to make this happen.
The complete code:
const STUDENT_NUM: u32 = 5_000_000;
pub struct Student {
id: u32,
sec_id: u32,
}
fn main() {
let mut students_id: HashMap<u32, Rc::<Student>> = HashMap::with_capacity(STUDENT_NUM as usize);
let mut students_id2: HashMap<u32, Rc::<Student>> = HashMap::with_capacity(STUDENT_NUM as usize);
for x in 0..STUDENT_NUM {
let student = Rc::new(Student{id: x, sec_id: x + STUDENT_NUM});
students_id2.insert(x, Rc::clone(&student));
students_id.insert(x + STUDENT_NUM, student);
}
}
This works fine. However, this incurs some serious memory and performance penalty.
The above code takes twice the memory and 50% more time that is needed if I just duplicated the Student struct completely and inserted it to a whole different HashMap.
My question are:
Why is Rc so memory intensive isn't it suppose to be a simple pointer with a ref counter?
Why it takes so long for the Rc to operate? More than twice the time for using the structs without it.
Is there a better solution to be able to search with two keys for the same object? Am I missing some feature of Rust that can make this simple?
I recommend making a vector and storing indexes into that vector in the two hashmaps instead. The reason Rc is expensive is that every Rc is a separate memory allocation.
Maybe this is a noob question, but if I remove a Student from a Vector wouldn't the indexes of the remaining Students in the Vector change, making the HashMaps irrelevant?
I should've mentioned that the Students can be added/removed to the collection.
It seems like you're basically wanting a database table with two indices. You're essentially implementing this using a collection to hold the data (a HashMap<u32, Student>) and an index over sec_id (I'd make it a HashMap<u32, u32> that maps sec_id -> id, though.) You might try using an in-memory database with sqlite or something like that.
The drawback of using a sec_id->id mapping is the need to do two lookups into two HashMaps when I want to do a get by sec_id. A performance penalty I wish I could overcome.
To add to the other answers, the reason your memory usage doubles isn't that Rc is expensive, but that Student is extremely cheap. Student is 2 32-bit numbers, so altogether it'll be 64 bits or 8 bytes. On a 64bit machine, that's literally the same size as a pointer. An Rc, being a separate allocation, needs at least 1 pointer and 2 counts (weak count and strong count), so even without any data it's 3 times the size of Student.
If this is your real data struct, and you don't need to mutate it, I would recommend just dropping the Rc and storing copies of the data in each HashMap. It's extremely cheap to copy your data, given that it's only 8 bytes, and if you aren't updating it, I don't see any benefit to using Rc.
With that said, the Vec solution is often a good one, especially if your real data is more complicated than an 8-byte struct.
Thanks for the reply.
I don't think that you always have to do multiple lookups.
My code in the original post does only one lookup in one HashMap to find a student (either by id or by sec_id). The only issue is memory consumption and performance of the Rc.
That's an interesting option. I'll have to check its performance in a real life scenario.
I assume it will perform nicely since there shouldn't be many remove operations on the Vector.