Multiple Implementations of a Data Structure

I am confused about how to provide an API where I have several
implementations of a container type, with differing performance
characteristics.

To give the background, I am building an crate for manipulating an
Ontology, which is a set of Axioms. There are about 30 different
kinds of Axiom all represented as a big enum. So far my
implementation of Ontology just uses a BTreeSet to hold everything
in.

That's fine so far, and has worked well for parsing data from one
format that I am working with; but for another format, the performance
is terrible, because I have to do a linear search of earlier part of
the set for new elements that I parse, so I get quadratic performance
(at least, I think it might be worse).

So, I need to have indexes somewhere to make the lookup constant
time. Straight-forward enough, I thought, but I cannot work out the
design. I would like to be able to add the indexes as I need them.

If I was doing this in Java, I might do something like create an
interface for Ontology, then implement with a number of classes,
with different characteristics. Or, I could have a base Ontology and
implement an OntologyIndex which lets me search, and hook them
together with listeners.

None of this seems right for Rust. Any thoughts or general pointers
would be helpful.

2 Likes

I'm working on a general solution for this exact problem, but it's nowhere near ready. The design I'm settling in on is a write-through index, which in your case would look something like this:

  • A basic in-memory storage solution (your current BTreeSet, or a Vec, or whatever)
  • An indexing wrapper object that owns the in-memory storage, and adjusts its index as inserts, updates, and deletes come through for the backing store
  • A trait that both implement, so that you can use the indexed storage interchangeably with the non-indexed one, including as the inner storage for another index

If you only need the index for the file reading portion, you can have a method on the indexer to destroy itself and return the internal storage.

Hi, I was doing something similar lately (although with maps instead of sets: HashMap, Vector of (key, value) pairs, BTreeMap could be also added): I wanted to be able to replace one data structure with another one to debug some suspicious memory problems.

Here's my playground, you may inspire there:

It's not made very thoroughly, I've just implemented interface I needed. But might be helpful as a staring point. I had to reach for nightly rust and generic associated types (GATs), maybe you will need them as well (depends on what part of the interface you need).

I believe that when GATs are stabilized, many such abstractions will find their way soon into the standard library.

Yeah and the unsafe pointers in the AssociationListOccupiedEntry are not necessary. See the discussion here, where @mbrubeck proposes a more elegant solution (yet maybe slightly less efficient because of the additional indexing?)

This was one conclusion that I had come to also -- a trait, a base storage and one (or more) indexes. I figured that the indexes would either be transparent or, alternatively, provide unique search methods (which work expose the indexed portion). In the latter case, you'd have to keep around a reference (or smart pointer) to the index so that you could stack several indexes on top of each other while still accessing their underlying methods. I struggled to actually implement this, though, and was hoping there was a nice example that I could plagiarise.

I was thinking about what to return from a parser. I need to use an index to make parsing efficient; I could destroy them at the end, but it seemed to make more sense to return them from the parser in case the client wants the same indexing. This does rather make then part of the parser API though which is a bit unfortunate, as it's not the sort of thing you want to guarantee.

I tried something like this, but believe from another thread on here that I also need GATs which is painful, as I'd rather avoid nightly. It might be necessary at this point though.

Working on something similar, I can empathize. Transparent caches/indices seem like the way to go for this approach, but the API design for it is not easy— I ended up teaching set algebra to the type system just to get records behaving nicely; I haven’t really started in on collections much yet.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.