Ordered list backed by btree like structure?

I'm looking for an ordered list where I can efficiently:

  1. Get value at given list index
  2. Insert/Remove value at given list index
  3. Get index of a value in the list

A Vec won't work because insert/remove/index_of_item are all potentially expensive with large lists. I think a BTree like structure with parent links is what I need. In particular here's an example of a structure that does exactly what I want in Javascript from CodeMirror:

Each internal node stores the total number of items under it. And using that total makes it fast to find item at particular index, and to go backward (up through parent links and prev-siblings in tree structure) to find index of given item.

My questions:

Does this or something similar already exist in rust ecosystem or do I need to implement this myself? Any other relevant wisdom is also welcome :slight_smile:

Jesse

I've recently learned about im_rc::Vector from Concat two im_rc::Vector in O(log n) time?

It's main purpose seem to be an immutable data structure, but it seems to satisfy all of your requirements. It has all operations (even cloning and splitting!) in O(log(n)) and push/pop/iteration in O(1). The only thing missing is:

  1. Get index of a value in the list

I think it could be done by binary search, which would make it O(log² n), given O(log n) indexing.

This is not ordered! sorted so binary search would be "expensive" O(n log n) :slightly_smiling_face:

@ehsanmok Yes, you're right. But this could be used as a backing storage with a binary search on top, giving a reasonable efficiency.

What terms should I used to describe instead? Just a list? I thought "ordered" meant items are stored in particular (indexable) order, and "sorted" would mean that they are kept in a sorted order. But I really have no idea.

Yeah, that's my fault. I think I've just started reading your requirements and somehow the word "ordered" missed my eyes. :upside_down_face:

But I remember myself having similar problems and making an indexable list ordered is easier than adding "get index of element" feature to an ordered set. So I still think if you have a structure with efficient in-the-middle insertion and efficient by-index lookup, you can add a binary search on top and provide an ordered container backed by such a structure.

1 Like

And you are right... that's almost what I want... though I don't think #3 can be solved with binary search... because the items are not stored in a sorted order (maybe that's what @ehsanmok is talking about?). But I do think the underlying data structure might be able to efficiently get the index with more API... I'm going to look into that now.

Why not use Rust's BTreeMap?

https://doc.rust-lang.org/std/collections/struct.BTreeMap.html

Which presumably keeps order.

Do you really need the notion of a numerical "index"? As in [0], [1], [2], [3]...into an array.

If you ensure (also by binary search) that you're always inserting an element in a correct spot, I think you'll be able to maintain the ordered invariant.

Anyway, I recently tried to search for such a crate (basically indexable btree), but failed to find one. I think there's one in a private module of some other crate. I'll try to post a link if I could find a topic.

Yes, edited my comment above related to binary search.

how about im::OrdMap?

Thanks for all the suggestions... but I think I really do want an API like a Vec where there are no keys and where items themselves are not ordered. Instead the structure should define the order, not the items. I'll update this post with a better description of my uses cases soon.

Use Case 1

I have a large file tree and I want to maintain a list of DirEntry from that tree that match a given query. I want this list of matches ordered according to a depth first traversal of the file tree.

If this is my file tree:

  • a*
    • e
    • b*
  • d
    • c
      • f*

And my query is for "*" ... then my matches list will look like this:

  • a*
  • b*
  • f*

Over time the tree will change and post events (file changed, file inserted, file removed, etc). The code that maintains the matches list will re-run the query and needs to update the matches list. Updating the list is one of the tricky parts... because I need to figure out proper location to insert new matches... I need to be able to quickly compare depth first ordering of any two items in the file tree.

One way to do this is to maintain an ordered list (the data structure that I want) of the depth first traversal of the file tree. I keep this list updated as the tree changes and can use it to quickly compare the depth first ordering of any two items in the tree.

Use Case 2

I also want to fire events when my list of matches changes. So lets say file "b*" is changed to "b". It then gets removed from the list of matches since it no longer has a "". I wan to communicate that change through an event that says (removed item at index 1). But with a big list it's expensive to determine that "b" was located at index one...

So again for the matches list I'll also use this same ordered list data structure, so that I can efficiently update the list of matches, and post events with the indexes of items that have changed, inserted, or been removed.

What OrdMap lacks is a simple indexing (basically fn get(&self, usize) -> V)). I think you can hack indexing on top of OrdMap by using split or skip, but from the docs I'm not sure about their complexity – whether they're O(log n) or O(n). If they're log(n), OrdMap would be great!

Well ... I think what you want needs more clarifications. How much constraint no. 3 is important to your application? any flat Vec-like is doomed with O(n) search for index. Efficient BST-like (B-tree) is perhaps the most efficient but you still want something like Vec and don't want to deal with keys ... then nothing comes to my mind now :woozy_face:

Is this "array like" thing you want supposed to be fully populated, from index 0 to something. or is it only sparsely populated?

You could turn your "index" numbers into hashes with std::hash https://doc.rust-lang.org/std/index.html and use the hash to write items into the "array" which is actually a hashMap.

Wrap that up in some trait or whatever and overload the array indexing operators to give you numerically indexed access to the hashMap.

Ok, thanks for taking a look. I've built this myself a few times (Copying CodeMirror example first time, and then with minor changes) but I always thought I was probably reinventing the wheel. Good to have at least a little feedback that it's not obviously a feature set that's satisfiable with an existing crate.

Yes, fully populated just like a Vec. That's maybe the defining factor... because when items are removed any trailing items will slide back by that number of indexes. So I want remove to remove item at that index, not set it to null.

Thanks again for your help... if you come across that crate please let me know. I'll just start on my own implementation now.

offhand some variations on a skiplist might seem like a plausible implementation,
OrderedSkipList seems like it might do the trick.

I think this won't work because you can't really insert items into list. To insert item you would need to increment the hash (index) of all following items.

I don't think so, at least not through public API. It allows you to read at a given index, but doesn't seem to allow you to insert at a given index... instead it expect the values that you are inserting to be sortable.

I've updated with a few use cases here: