Sorted Array/Vector Container


#1

Has anybody put together a wrapper around std::vec::Vec that provides automatic sorting of its contents?

Such a zero-space-overhead wrapper container typically limits the operations possible to a set-like interface.

Typical members are insert() -> bool and contains() -> bool which both should use binary search. Insertion returns true if an element was added.

Two variants of this container is typically useful; a set-like (with no duplicates) and the one allowing duplicates.


#2

For the set-like interface collections::BTreeSet might be what you’re looking for. Documentation here.

It implements both insert(&mut self, val: T) -> bool and contains(&self, val: &T) -> bool (actually, the contains interface is more generic). Also, it is possible that collections::BinaryHeap may satisfy your need for a more vec-like interface, but the underlying vec is of course not sorted.


#3

What’s the memory overhead of collections::BTreeSet compared to std::vec::Vec?


#4

BTreeSet isn’t a good replacement for what @nordlow is asking for. The sorted vectors have a quite different balance of pros and cons compared to ordered tree structures.

We could add a lightweight sorted slice/sorted vector wrapper to the std library, with few set-like methods.

In D language there is SortedRange adapter. It’s produced by sorting functions. This is quite handy (and you can get the slice out of SortedRange with a release() method):

const data = foo.sort().release();

https://dlang.org/library/std/range/sorted_range.html


#5

(Warning, off topic)

We should definitely learn from D ranges in general. I mean Rust has its own awesome (in their own way) iterators , but no trait(s) yet that generalize more or less all the other operations one can do on a slice (split in half, random access split, random access index).

I mean I often do experiments in this direction but I haven’t really tried to find good traits for something “simple” as the preceding paragraph.

For example one “benchmark problem” could be: A trait that means you can treat the slice (&[T]) and its reverse with the same function.


#6

For inspiration take https://github.com/nordlow/phobos-next/blob/master/src/array_ex.d

In ~1600 lines (the rest is unittests) it implements a C+±style array container Array that supports

  • small-array optimization (via Small and Large)
  • three different kinds of assignment semantics (Assignment),
  • two optional sortednesses (via template parameter Ordering)
  • opIndex and opSlice support (currently unsafe until Walter Bright’s DIP-1000 PR for DMD has been merged)
  • less-than operator via (less compile-time parameter)
  • C-style or GC-style allocation (useGCAllocation)
  • I’ve also borrowed Rust’s naming in withElement, withCapacity, withLength :slight_smile:

The module compiles and links in debug mode in 1.3 seconds.

In that time the unittests realize 69 unique template instantiations of Array.

I would like to know how many lines of code Rust requires to realize this and how fast that compiles.

I’m not bragging. Rust has a safer language, much greater tooling and user support than D has. I’m just showcasing what remains to be accomplished in Rust to be on pair with D’s capabilities in terms of

  • compilation performance,
  • meta-programming, and
  • the concept “Design by introspection” coined by Andrei Alexandrescu.

#7

There are a lot of things sorely lacking from the current Rust standard library and/or ecosystem, and this is one of them. (either that, or I am sorely lacking the ability to find stuff on crates.io and github)

Just this week while writing what is supposed to be a little throwaway computational script, I ended up needing to cobble together something like this: (it is basically C++ multimap, stripped to the bare functionality I need)

type Key = f64;
type Value = usize;
struct SortedIndices {
    keys: Vec<Key>,
    values: Vec<Value>,
}

impl SortedIndices {
    fn new() -> Self { SortedIndices { keys: vec![], values: vec![] } }

    fn insert(&mut self, k: Key, v: Value) {
        let i = self.lower_bound(k);
        self.keys.insert(i, k); self.values.insert(i, v);
    }

    fn lower_bound(&self, k: Key) -> usize {
        match self.keys.binary_search_by(|b| k.partial_cmp(b).unwrap()) {
            Ok(x) => x, Err(x) => x,
        }
    }

    fn upper_bound(&self, k: Key) -> usize {
        let mut i = self.lower_bound(k);
        for i in i+1..self.keys.len() {
            if self.keys[i] > k { return i; }
        }
        return self.keys.len();
    }

    fn range(&self, from: Key, to: Key) -> &[Value] {
        &self.values[self.lower_bound(from)..self.upper_bound(to)]
    }
}

(I haven’t even gotten to test it yet; just writing this post I had to ninja-fix a good 20 or so mistakes in the playpen)

BTreeSet is just sad-face-making in general. Many many times I’ve wanted an ordered set of f64s and I almost always end up settling for a sorted vector since the ergonomic cost of wrapping an f64 to be Ord is just too high.