Solving a multi-constraints map problem

Hi there!

I'm currently building an audio server where I need, at one point, to store data about the tracks detected on the disk.

I need to have the following:

  • A way to access a track from its ID (which is a string)
  • A way to get the tracks' ID in order (the sorting algorithm is custom-made from the tracks' informations itself, which are not stored in the ID)

Basically, I would need something like this:

struct Index {
    tracks: MagicalMapType<TrackID, Track>
}

impl Index {
  pub fn get_track_from_id<'a>(&'a self, id: &TrackID) -> Option<&'a Track> {
    self.tracks.get(id)
  }

  pub fn get_sorted_track_ids<'a>(&'a self, from: usize, take: usize) -> Vec<&'a Track> {
    self.tracks.iter_with_my_custom_track_sort_algorithm()
        .skip(from).take(take).collect()
  }
}

There can be dozens of thousands of files, and it needs to be really quick even on modest hardware. Which means I can't use a Vec but it would be to slow to get a track from its ID, but I cannot use a HashMap or BTreeMap either because I could not choose the sorting order.
And I cannot use both side-by-side because I cannot have the track's data in two places at once, unless I use some references which is not possible because the state object cannot have a lifetime (contraint due to other manipulations and libraries I use).

Do you know how I could solve this problem?

Thanks in advance for your help!

You absolutely can. Write a wrapper type around the ID and equip it with a custom Ord implementation.

I can't, because the ordering is not performed from the ID itself but from the track's informations.
So the keys must be ordered depending on their associated value.

Would something like this work for you? (untested)

struct Index {
    tracks: HashMap<TrackID, Track>
    order: Vec<TrackID>
}

impl Index {
  pub fn get_track_from_id<'a>(&'a self, id: &TrackID) -> Option<&'a Track> {
    self.tracks.get(id)
  }

  pub fn get_sorted_tracks<'a>(&'a self, from: usize, take: usize) -> Vec<&'a Track> {
    self.order.iter().map(|id| self.get_track_from_id(id)).collect()
  }

  pub fn insert(&mut self, track:Track) {
    match self.order.binary_search_by(|id| self.tracks[id].my_custom_cmp(&track)) {
      Ok(idx) => unimplemented!(), // ID already inserted
      Err(idx) => {
        self.order.insert(idx, track.id);
        self.tracks.insert(track.id, track);
    }
  }
}
2 Likes

That's exactly how I implemented it, but this means I have to store the IDs two times and have to update two data structures instead of one, which is the problem.

There are a few multi-dimensional data structures, like R-trees, but I'm not familiar with any Rust crates for them. Unless you need to bound multiple dimensions in the same query, they're likely worse than mantaining several single-dimension indices.

1 Like

https://crates.io/crates/rstar

1 Like

Thank you, I'll look into that!

just use BTeeMap is ok
here is the code

code here

3 Likes

Very clever, I didn't think about that.
Thank you!

I finally solved my problem by creating my own map type. Here is the code:

use std::{
    collections::{hash_map::Keys, HashMap},
    hash::Hash,
    slice::Iter,
};

use serde::{Deserialize, Serialize};

#[derive(Serialize, Deserialize)]
pub struct SortedMap<K: Eq + Hash, V: Ord> {
    values: Vec<V>,
    indexes: HashMap<K, usize>,
}

impl<K: Eq + Hash, V: Ord> SortedMap<K, V> {
    pub fn from_vec(mut values: Vec<V>, value_index: impl Fn(&V) -> K) -> Self {
        values.sort();

        let indexes = values
            .iter()
            .enumerate()
            .map(|(i, value)| (value_index(value), i))
            .collect();

        Self { values, indexes }
    }

    pub fn from_hashmap(map: HashMap<K, V>) -> Self {
        let mut entries: Vec<_> = map.into_iter().collect();
        entries.sort_by(|(_, a), (_, b)| a.cmp(b));

        let mut values = Vec::with_capacity(entries.len());
        let mut indexes = HashMap::with_capacity(entries.len());

        for (i, (key, value)) in entries.into_iter().enumerate() {
            values.push(value);
            indexes.insert(key, i);
        }

        Self { values, indexes }
    }

    pub fn empty() -> Self {
        Self {
            values: vec![],
            indexes: HashMap::new(),
        }
    }

    pub fn contains_key(&self, key: &K) -> bool {
        self.indexes.contains_key(key)
    }

    pub fn get<'a>(&'a self, key: &K) -> Option<&'a V> {
        let index = self.indexes.get(key)?;
        Some(self.values.get(*index).unwrap())
    }

    pub fn get_index(&self, key: &K) -> Option<usize> {
        self.indexes.get(key).copied()
    }

    pub fn keys(&self) -> Keys<K, usize> {
        self.indexes.keys()
    }

    pub fn values(&self) -> Iter<V> {
        self.values.iter()
    }

    pub fn len(&self) -> usize {
        self.values.len()
    }
}

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.