Data structure with views / indexes into itself


#1

I have an in memory database like object, which is a collection of owned structs, and then multiple views into that collection, allowing fast access based upon different field types. For example, this code:

use std::collections::BTreeMap;

#[derive(Debug)]
struct Movie {
    name: &'static str,
    year: i32,
    rating: i32,
}

#[derive(Debug)]
struct MovieDb {
    list: Vec<Movie>,
    by_year: BTreeMap<i32, Vec<usize>>,
    by_rating: BTreeMap<i32, Vec<usize>>,
}

impl MovieDb {
    fn get_first_by_year(&self, year: i32) -> &Movie {
        let index = &self.by_year[&year];
        // TODO: Error handling.
        &self.list[index[0]]
    }
}

fn build_movie_db() -> MovieDb {
    let mut list = vec![];
    list.push(Movie{name: "The Thing", year: 1982, rating: 9});
    list.push(Movie{name: "The Matrix", year: 1999, rating: 10});
    list.push(Movie{name: "Memento", year: 2000, rating: 8});
    list.push(Movie{name: "Alien", year: 1979, rating: 9});

    let mut by_year = BTreeMap::new();
    let mut by_rating = BTreeMap::new();
    for (i, m) in list.iter().enumerate() {
        by_year.entry(m.year).or_insert(vec![]).push(i);
        by_rating.entry(m.rating).or_insert(vec![]).push(i);
    }

    MovieDb{list: list, by_year: by_year, by_rating: by_rating}
}

fn main() {
    let db = build_movie_db();
    println!("{:?}", db);

    let m1 = db.get_first_by_year(1999);
    println!("{:?}", m1);

    let m2 = db.get_first_by_year(1982);
    println!("{:?}", m2);

    let m3 = db.get_first_by_year(1999);
    println!("{:?}", m3);
}

This works, but it feels strange to need to store indexes as int lists. What if the owning collection isn’t a Vec, but a linked list, or something else? I understand that storing normal references isn’t possible, due to lifetimes, but I got a version working with raw pointers.

#[derive(Debug)]
struct MovieDb {
    list: Vec<Movie>,
    by_year: BTreeMap<i32, Vec<*const Movie>>,
    by_rating: BTreeMap<i32, Vec<*const Movie>>,
}

impl MovieDb {
    fn get_first_by_year(&self, year: i32) -> &Movie {
        let list = &self.by_year[&year];
        unsafe { &*list[0] }
    }
}

...

        for m in list.iter() {
            let p = m as *const Movie;
            by_year.entry(m.year).or_insert(vec![]).push(p);
            by_rating.entry(m.rating).or_insert(vec![]).push(p);
        }

Is this the best I can do? Is there no in-between data type or technique I can use to have non-owned, but lifetime-safe references to data I know already exists? What if I’m sure that my collection will never be modified while my views are being instantiated?


#2

Your other option is something like Rc, but that incurs overhead. With raw pointers, you’ll need to make sure the data structure holding the owned values has a stable address when moved (Vec does) and as you said, nothing mutates that container that could cause the raw pointers to become invalid.

A different design option might be to have a separate structure that holds a reference to MovieDb and then provides the additional views there.


#3

Thanks for the reply!

I definitely want to avoid the overhead of Rc. Is there anyway to tell if a data structure has move stability, apart from reading the docs? Maybe a trait it something?

For that second option, does that mean I would need to have a static owner of the data, or at least something with a longer lifetime? Is it possible to have the function build_movie_db written like I have above, which just returns the newly constructed object without any parameters, or do I need to pass in the owner? I tried using lifetimes a bit but couldn’t make heads or tails of how to properly use them.


#4

In rust you can not hold data and reference to data in the same struct,
but you can split storage and index, for example:

use std::collections::BTreeMap;

#[derive(Debug)]
struct Movie {
    name: &'static str,
    year: i32,
    rating: i32,
}

#[derive(Debug)]
struct MovieDb<'a> {
    by_year: BTreeMap<i32, Vec<&'a Movie>>,
    by_rating: BTreeMap<i32, Vec<&'a Movie>>,
}

impl <'a> MovieDb <'a> {
    fn get_first_by_year(&'a self, year: i32) -> &'a Movie {
        let index = &self.by_year[&year];
        // TODO: Error handling.
        &*index.first().unwrap()
    }
}

fn build_movie_db<'a>(list: &'a Vec<Movie>) -> MovieDb<'a> {

    let mut by_year = BTreeMap::new();
    let mut by_rating = BTreeMap::new();
    for m in list.iter() {
        by_year.entry(m.year).or_insert(vec![]).push(&*m);
        by_rating.entry(m.rating).or_insert(vec![]).push(&*m);
    }

    MovieDb{by_year: by_year, by_rating: by_rating}
}

fn main() {
    let list = vec![
        Movie{name: "The Thing", year: 1982, rating: 9},
        Movie{name: "The Matrix", year: 1999, rating: 10},
        Movie{name: "Memento", year: 2000, rating: 8},
        Movie{name: "Alien", year: 1979, rating: 9},
    ];
    
    let db = build_movie_db(&list);
    println!("{:?}", db);

    let m1 = db.get_first_by_year(1999);
    println!("{:?}", m1);

    let m2 = db.get_first_by_year(1982);
    println!("{:?}", m2);

    let m3 = db.get_first_by_year(1999);
    println!("{:?}", m3);
}

#5

Take a look at the owning_ref crate: https://crates.io/crates/owning_ref