How to merge Slotmaps

I am planning on using a Slotmap to store the edges of a graph and be able to maintain index based references to them in my nodes.

I currently have the connections stored in a Vec which has issues with index reuse and ABA problems, etc (hence slotmap).

My one problem with this approach, is that I currently use Serde to read in multiple TOML files that can each contain my main data struct.

I need to have just one instance of the struct, so I merge the instances during initial startup which works fine with Vecs and HashMaps but I am unclear how I would do that with SlotMaps.

I currently merge the HashMap's with a helper function that checks for duplicated keys and errors if it finds any.

Any thoughts?

why can't you merge the slotmaps directly? like, iterate through the elements and insert them into a single map? after all, they are key-value storages, just like hashmaps. what prevents you from doing this? can you show some snippet what your data structure looks like?

I guess I could append to a master slotmap. I would just have to slotmap.insert() each and every item as I don't see an option to use .extend().

Current struct that would need to be merged is:

#[derive(Debug, Default, PartialEq, Serialize, Deserialize, Clone)]
#[serde(default)]
#[serde(deny_unknown_fields)]
#[non_exhaustive]
pub struct Project {
    /// contains all cables read in from files, and/or added in via program logic.
    pub cables: BTreeMap<String, cable::Cable>,
    /// `connections` contains all connections between different equipment/cables/wires.
    pub connections: Vec<connection::Connection>,
    //TODO: are connectors going to be read in separately?
    /// contains all connectors read in from files, and/or added in via program logic.
    pub connectors: BTreeMap<String, connector::Connector>,
    /// contains all enclosures read in from files, and/or added in via program logic.
    pub enclosures: BTreeMap<String, enclosure::Enclosure>,
    /// contains all equipment read in from files, and/or added in via program logic.
    pub equipment: BTreeMap<String, equipment::Equipment>,
    /// contains all mounting rails read in from files, and/or added in via program logic.
    pub mounting_rails: BTreeMap<String, mounting_rail::MountingRail>,
    /// contains all pathways read in from files and/or added in via program logic.
    pub pathways: BTreeMap<String, pathway::Pathway>,
    /// contains all term cables read in from files, and/or added in via program logic.
    pub term_cables: BTreeMap<String, term_cable::TermCable>,
    /// contains all terminal strips read in from files and/or added in via program logic.
    pub terminal_strips: BTreeMap<String, terminal_strip::TerminalStrip>,
    /// `wires` contains all wires read in from files, and/or added in via program logic.
    pub wires: BTreeMap<String, wire::Wire>,
}

I plan on changing connections to a Slotmap instead of a Vec.

My current merge() function is below:

    pub fn merge(&mut self, test_map: Project, test_file: &Path) -> Result<(), Error> {
        util_functions::merge_btreemaps(&mut self.cables, test_map.cables, test_file)?;
        self.connections.extend(test_map.connections);
        //TODO: are connectors going to be read in separately?
        //util_functions::merge_btreemaps(&mut self.connectors, test_map.connectors, test_file)?;
        util_functions::merge_btreemaps(&mut self.enclosures, test_map.enclosures, test_file)?;
        util_functions::merge_btreemaps(&mut self.equipment, test_map.equipment, test_file)?;
        util_functions::merge_btreemaps(&mut self.mounting_rails, test_map.mounting_rails, test_file)?;
        util_functions::merge_btreemaps(&mut self.pathways, test_map.pathways, test_file)?;
        util_functions::merge_btreemaps(&mut self.term_cables, test_map.term_cables, test_file)?;
        util_functions::merge_btreemaps(&mut self.terminal_strips, test_map.terminal_strips, test_file)?;
        util_functions::merge_btreemaps(&mut self.wires, test_map.wires, test_file)?;
        Ok(())
    }

And merge_btree_maps() is here:

pub fn merge_btreemaps<U, V>(origin_map: &mut BTreeMap<U, V>, test_map: BTreeMap<U, V>, test_file: &Path) -> Result<(), Error>
where
    U: Ord + Clone + ToString,
    V: FromFile,
{
    // don't need to do any work if test map is empty
    if test_map.is_empty() {
        return Ok(());
    }
    for (key, value) in test_map {
        if let Entry::Vacant(entry) = origin_map.entry(key.clone()) {
            entry.insert(value);
        } else {
            return Err(Error::DuplicateKey {
                key: key.clone().to_string(),
                origin_file: value.datafile().clone(),
                test_file: test_file.to_path_buf(),
            });
        }
    }
    Ok(())
}

that's the nature of slotmap: you are supposed to store the returned key for each inserted item. it makes little sense to insert multiple items at once, since you'll lost their keys, then the only way to access those elements would be to iterate through the entire collection, in which case other containers like Vec would be more suitable.

what's the benefit in doing so? specifically, what's the access pattern of connections? where would the keys be stored if you switched to slotmap?

you cannot merge two slotmaps without invalidating all keys. keys are only unique within a single instance of a map. this is also true for serialization and deserialization. mixing keys from different maps is not a memory safety problem, but a logic error.

I didn't see you adjust the keys for connections (indices, in case of using Vec) during the merge, so I assume the connections are used for reversed lookups?

if I missed something, please fill me in.

Its a bit messy lol. My application is a 2D electrical CAD software, so there is a lot of data and relationships that need to be managed.

The basic logic I am following currently, is as follows:

  1. Get list of files to read data from
  2. Use Serde to read data from TOML files into temporary structs
  3. Merge data from each temp struct into master structs using the merge function above. (I have 2 main structs, Library and Project).
  4. Validate library data. (Some library data contains string keys that reference other library data. This step insures everything is defined correctly).
  5. Validate project data. (Project data can contain string keys to both Library and Project data)
  6. Update data on Project data from library data (trading off increased memory usage for decreased lookups)
  7. (This is what I am working on now): Add a index reference to each Equipment/Terminal for each connection that they are linked to. This is to avoid having to iterate through the entire connection vec again. This is where slotmap comes in.
  8. Start GUI.

I am completely self educated on all this stuff, so if you have any other suggestions, I am open.

The complete project (minus a few recent updates that I haven't pushed) is here

I see, that makes sense.

I assume the keys will be fixed-up once all the data are loaded and merged? in this case, I guess the keys are stored in the nodes in memory only, they don't need to be serialized with the nodes?

due to the nature of slotmap where a key must be allocated/calculated everytime an element is inserted into the container, there's no point to provide a bulk insertion API such as extend() or extend_from_slice() like what Vec does, because slotmaps cannot do similar optimizations for such use cases.

the best you can do is to call insert() in a loop. but this isn't something to avoid in the first place.

in fact, even if you could have merged two slotmaps in bulk, you still needs to fixup the keys afterwards, so you need to iterate through the elements anyway. now you can update the keys the same time as the connections are being inserted into the map, something like this:

impl Project {
    fn merge(&mut self, other: Project) {
        for (_, mut connection) in other.connections {
            self.connections.insert_with_key(move |key| {
                connection.node1.update_connection(key);
                connection.node2.update_connection(key);
                connection
            }); 
        }
    }
}

alternatively, since this is an index to speed up lookups, you can separate it from the main data structure. it doesn't need to be persisted, and doesn't need to be merged during the data loading process. instead, rebuild the index from scratch after the data is loaded.

a pattern I use a lot is to define separate types for in-memory data and serialized data. this also allows me to control the version of the file format. I think it might be adapted and modified for your use case too. here's some contrived snippets how this pattern looks:

/// the in-memory data structure
///
/// - no need to be serializable
/// - can have non-persistent states such as reverse lookup index
/// - no need to use `Option`, default value can be calculated at load time
struct Data {
	foo: bool,
	bar: String,
	baz: Box<dyn Trait>,
}

mod persist {
	/// the on-disk data format, (ab-)use the serde enum tag for versioning
	#[derive(Debug, Serialize, Deserialize)]
	#[serde(tag = "format_version", rename_all = "lowercase")]
	enum Data {
		V1(DataV1),
		V2(DataV2),
		#[serde(other)]
		Unsupported,
	}

	#[derive(Debug, Serialize, Deserialize)]
	struct DataV1 {
		foo: Option<bool>,
	}

	#[derive(Debug, Serialize, Deserialize)]
	struct DataV2 {
		/// for convenience, can use `Deref`
		#[serde(flatten)]
		compat: DataV1,
		bar: Option<String>,
	}
}

/// the load logic, including schema validation, version conversion, index building, etc.
impl TryFrom<persist::Data> for Data {
	type Error = LoadError;

	fn try_from(value: persist::Data) -> Result<Self, LoadError> {
		use persist::Data::*;
		match value {
			V1(v1) => todo!("loaded v1"),
			V2(v2) => todo!("loaded v2"),
			Unsupported => {
				eprintln!("unsupported config file version");
				Err(())
			}
		}
	}
}

/// the save logic, may trim or compact redundent information, etc
impl From<&Data> for persist::DataV2 {
	//...
}

/// save in old format is lossy, not enabled by default
#[cfg(feature = "compat-v1")]
impl From<&Data> for persist::DataV1 {
	// ...
}

/// some convenient wrapper functions
impl Data {
	fn load(path: &Path) -> Result<Self> {
		serde_json::from_slice::<persist::Data>(&fs::read(path)?)?.try_into()
	}

	fn persist(&self, path: &Path) -> io::Result<()> {
		let config = persist::Data::V2(self.into());
		fs::write(path, serde_json::to_string(&config).expect("deserialize"))
	}

	#[cfg(feature = "compat-v1")]
	fn persist_compat(&self, path: &Path) {
		let config = persist::Data::V1(self.into());
		fs::write(path, serde_json::to_string(&config).expect("deserialize"))
	}
}

Correct

I used a similar approach in an earlier version of this application, but I was using Rc<Refcell<>> instead of indexing and it got incredibly complex to maintain. (I was also doing all the validation and merging in one massive super nested function which didn't help).

I think I can work with my existing data structures and slot map for now, but I will definitely add splitting up the structs into in-memory and persist versions to my TODO list.

(Trying to avoid another massive refactor so I can actually make progress on this project LOL.)

Thanks!

Edit:

I did actually go back and split out the datatypes into separate file and in_memory versions, which allowed me to remove several Options in my in_memory versions, simplifying other portions of my code dramatically.