Data structure with views / indexes into itself

I have an in memory database like object, which is a collection of owned structs, and then multiple views into that collection, allowing fast access based upon different field types. For example, this code:

use std::collections::BTreeMap;

#[derive(Debug)]
struct Movie {
    name: &'static str,
    year: i32,
    rating: i32,
}

#[derive(Debug)]
struct MovieDb {
    list: Vec<Movie>,
    by_year: BTreeMap<i32, Vec<usize>>,
    by_rating: BTreeMap<i32, Vec<usize>>,
}

impl MovieDb {
    fn get_first_by_year(&self, year: i32) -> &Movie {
        let index = &self.by_year[&year];
        // TODO: Error handling.
        &self.list[index[0]]
    }
}

fn build_movie_db() -> MovieDb {
    let mut list = vec![];
    list.push(Movie{name: "The Thing", year: 1982, rating: 9});
    list.push(Movie{name: "The Matrix", year: 1999, rating: 10});
    list.push(Movie{name: "Memento", year: 2000, rating: 8});
    list.push(Movie{name: "Alien", year: 1979, rating: 9});

    let mut by_year = BTreeMap::new();
    let mut by_rating = BTreeMap::new();
    for (i, m) in list.iter().enumerate() {
        by_year.entry(m.year).or_insert(vec![]).push(i);
        by_rating.entry(m.rating).or_insert(vec![]).push(i);
    }

    MovieDb{list: list, by_year: by_year, by_rating: by_rating}
}

fn main() {
    let db = build_movie_db();
    println!("{:?}", db);

    let m1 = db.get_first_by_year(1999);
    println!("{:?}", m1);

    let m2 = db.get_first_by_year(1982);
    println!("{:?}", m2);

    let m3 = db.get_first_by_year(1999);
    println!("{:?}", m3);
}

This works, but it feels strange to need to store indexes as int lists. What if the owning collection isn't a Vec, but a linked list, or something else? I understand that storing normal references isn't possible, due to lifetimes, but I got a version working with raw pointers.

#[derive(Debug)]
struct MovieDb {
    list: Vec<Movie>,
    by_year: BTreeMap<i32, Vec<*const Movie>>,
    by_rating: BTreeMap<i32, Vec<*const Movie>>,
}

impl MovieDb {
    fn get_first_by_year(&self, year: i32) -> &Movie {
        let list = &self.by_year[&year];
        unsafe { &*list[0] }
    }
}

...

        for m in list.iter() {
            let p = m as *const Movie;
            by_year.entry(m.year).or_insert(vec![]).push(p);
            by_rating.entry(m.rating).or_insert(vec![]).push(p);
        }

Is this the best I can do? Is there no in-between data type or technique I can use to have non-owned, but lifetime-safe references to data I know already exists? What if I'm sure that my collection will never be modified while my views are being instantiated?

1 Like

Your other option is something like Rc, but that incurs overhead. With raw pointers, you'll need to make sure the data structure holding the owned values has a stable address when moved (Vec does) and as you said, nothing mutates that container that could cause the raw pointers to become invalid.

A different design option might be to have a separate structure that holds a reference to MovieDb and then provides the additional views there.

Thanks for the reply!

I definitely want to avoid the overhead of Rc. Is there anyway to tell if a data structure has move stability, apart from reading the docs? Maybe a trait it something?

For that second option, does that mean I would need to have a static owner of the data, or at least something with a longer lifetime? Is it possible to have the function build_movie_db written like I have above, which just returns the newly constructed object without any parameters, or do I need to pass in the owner? I tried using lifetimes a bit but couldn't make heads or tails of how to properly use them.

In rust you can not hold data and reference to data in the same struct,
but you can split storage and index, for example:

use std::collections::BTreeMap;

#[derive(Debug)]
struct Movie {
    name: &'static str,
    year: i32,
    rating: i32,
}

#[derive(Debug)]
struct MovieDb<'a> {
    by_year: BTreeMap<i32, Vec<&'a Movie>>,
    by_rating: BTreeMap<i32, Vec<&'a Movie>>,
}

impl <'a> MovieDb <'a> {
    fn get_first_by_year(&'a self, year: i32) -> &'a Movie {
        let index = &self.by_year[&year];
        // TODO: Error handling.
        &*index.first().unwrap()
    }
}

fn build_movie_db<'a>(list: &'a Vec<Movie>) -> MovieDb<'a> {

    let mut by_year = BTreeMap::new();
    let mut by_rating = BTreeMap::new();
    for m in list.iter() {
        by_year.entry(m.year).or_insert(vec![]).push(&*m);
        by_rating.entry(m.rating).or_insert(vec![]).push(&*m);
    }

    MovieDb{by_year: by_year, by_rating: by_rating}
}

fn main() {
    let list = vec![
        Movie{name: "The Thing", year: 1982, rating: 9},
        Movie{name: "The Matrix", year: 1999, rating: 10},
        Movie{name: "Memento", year: 2000, rating: 8},
        Movie{name: "Alien", year: 1979, rating: 9},
    ];
    
    let db = build_movie_db(&list);
    println!("{:?}", db);

    let m1 = db.get_first_by_year(1999);
    println!("{:?}", m1);

    let m2 = db.get_first_by_year(1982);
    println!("{:?}", m2);

    let m3 = db.get_first_by_year(1999);
    println!("{:?}", m3);
}
2 Likes

Take a look at the owning_ref crate: https://crates.io/crates/owning_ref

1 Like