Looking for an appendable slicable reference-counted vec datatype

I need to be able to take a &mut reference to this type and get a prefix slice that snapshots its current length, allowing it to be iterated over while it is being appened to.

The slices would use reference counts and the data would live on the heap, allowing the slices to not have their lifetime tied to the original Vec.

The vec would be append-only unless there are live no slices of it (somewhat similar to RefCell).

does this datastructure exists, or do i need to create it myself?

my current solution is just to clone the entire vec if i need to iterate over it, which is not ideal.

EDIT: the datastructure does not need to be Sync, this is an issue with deep recursion, not with multithreadedness.

Looks like Vec::split_at_spare_mut will get you mostly there (nightly only).

no, it won't, because the initial &mut reference is actually of an outer struct, and i need to be able to reborrow that struct to call other methods on it (which may need to read the entire vec, or append to it), and thus i can't have any outstanding partial borrows. that's why i suggested using reference counting.

Persistent immutable data structures may solve your need. A persistent vector is cheap to clone, cheap to iterate (but not quite as cheap as a normal vector), and cheap to append.

2 Likes

I don’t know of any crates offhand. If you want to write your own, one option is to use OnceCell to make a shared-appendable linked list. (untested)

I don't know if this exists, but one way to do this would be to have a copy of the vector for each power-of-2 capacity of which there might be a snapshot. The total size is bounded by 2n regardless of the number of snapshots. Appends only happen to the main copy, the smaller ones are always full. When the last snapshot goes out of scope delete all smaller copies and then it becomes mutable.

If you want it to be sliceable and appendable then you'll need to a huge range of contiguous addresses. Which means preallocating your upper bound as virtual address space.

On linux you could achieve this with a relatively simple wrapper around a huge Vec::with_capacity though using mmap is probably still a better choice to make sure the allocation goes through even with a restrictive overcommit setting.
On windows you'd do more direct VirtualAlloc calls do do that reservation.

Your description is a bit un-concrete. You mention “slice” but it isn’t clear how contiguous or not you want your data to be. Literal &[T]-access? Or just anything that can be iterated? Or something in-between (chunking)?

Appending in a “Vec” involves re-allocation, which is incompatible with (certain notions of) keeping a “slice” of existing elements. You could also give more context on what kind of element-type you’re having (is it small copy-able data or references for example)?

Finally, when thinking about “create it myself” approaches, I also noticed it isn’t clear whether you want to support creating different prefix-slices over time, e.g. create a prefix-slice snapshot, then append some new elements, then create another prefix-slice snapshot (that contains the new elements, too) while the previously-created, shorter slice still coexists.


Anyways… for some of the possible ways I can interpret your question, I believe combining Rc with an append-only list type might be fitting. You can further wrap them if you want to keep your model where only one instance is the “original” datastructure, while the others only give fixed-length, read-only views.

Choice of append-only list type

The exact choice of “append-only list type” is still a bit open. I usually am aware of e.g. elsa::vec::FrozenVec or typed_arena::Arena; but the former limits your item-type (either Copy, or a stable-derefing pointer-type, and the latter has the wrong mutability properties… [appending gives &mut Item, and then you can’t have any iterating/indexing later].

I mean… well, FrozenVec is great of course if your types do fit this usecase…

I guess there’s also append_only_vec – (looks like I had looked at than one before, too) – but that one supports multi-threading which is possibly some overhead you don’t actually need. And when you do have &mut … access to it, it supports far fewer operations than Vec does.[1]

Implementation

For the code example, let’s actually use FrozenVec … and I mean, after all your data might be Copy – or a pointer like Box or Rc – already, anyway, right? And it can give full Vec access (because it uses an actual, contiguous Vec). The wrapper types’ implementations I’ve had in mind look roughly as follows now:

[dependencies]
elsa = "1.10.0"
stable_deref_trait = "1.2.0"
mod implementation {
    use elsa::FrozenVec;
    use stable_deref_trait::StableDeref;
    use std::rc::Rc;

    pub struct SnapshotVec<T>(Rc<FrozenVec<T>>);

    impl<T> SnapshotVec<T> {
        pub fn new() -> Self {
            Self(Rc::new(FrozenVec::new()))
        }

        /// Full mutable access to underlying `Vec`.
        ///
        /// Fails if any active `SnapShot` exists.
        pub fn get_mut(&mut self) -> Option<&mut Vec<T>> {
            Rc::get_mut(&mut self.0).map(|v| v.as_mut())
        }

        pub fn snapshot(&self) -> Snapshot<T> {
            Snapshot {
                vec: Rc::clone(&self.0),
                len: self.0.len(),
            }
        }

        pub fn push(&self, val: T) {
            self.0.push(val);
        }
    }

    impl<T> From<Vec<T>> for SnapshotVec<T> {
        fn from(value: Vec<T>) -> Self {
            Self(Rc::new(FrozenVec::from(value)))
        }
    }

    impl<T> Extend<T> for SnapshotVec<T> {
        fn extend<I: IntoIterator<Item = T>>(&mut self, iter: I) {
            iter.into_iter().for_each(|i| self.push(i))
        }
    }

    pub struct Snapshot<T> {
        vec: Rc<FrozenVec<T>>,
        len: usize,
    }

    impl<T> Snapshot<T> {
        pub fn iter_copied(&self) -> impl Iterator<Item = T> + '_
        where
            T: Copy,
        {
            (0..self.len).map(|i| self.vec.get_copy(i).unwrap())
        }

        pub fn iter_deref(&self) -> impl Iterator<Item = &T::Target>
        where
            T: StableDeref,
        {
            (0..self.len).map(|i| self.vec.get(i).unwrap())
        }
    }
}
use implementation::{Snapshot, SnapshotVec};

// demonstration
fn main() {
    let v = vec![200, 400, 600, 750];

    let mut v = SnapshotVec::from(v);

    let s1 = v.snapshot();
    // add all number +30
    v.extend(s1.iter_copied().map(|n| n + 30));

    // now add all numbers (including the new ones) +1
    let s2 = v.snapshot();
    v.extend(s2.iter_copied().map(|n| n + 1));

    // add all ORIGINAL numbers +40
    v.extend(s1.iter_copied().map(|n| n + 40)); // s1 can still be used here

    // sort the vector
    drop((s1, s2)); // <- delete snapshots first to prevent panic
    v.get_mut().unwrap().sort_unstable();

    let s3 = v.snapshot();
    // add all odd numbers +1000
    v.extend(s3.iter_copied().filter(|n| n % 2 == 1).map(|n| n + 1000));

    // turn back into `Vec`
    drop(s3); // <- delete snapshot first to prevent panic
    let v = std::mem::take(v.get_mut().unwrap());

    // show result
    println!("{v:?}");

    // [200, 201, 230, 231, 240, 400, 401, 430, 431, 440, 600, 601, 630, 631, 640,
    // 750, 751, 780, 781, 790, 1201, 1231, 1401, 1431, 1601, 1631, 1751, 1781]
}

  1. Understandably – Vec just has a lot of code, and it all would need to be re-implemented for supporting the same things on chunked data. ↩︎

2 Likes