Split owned Vec<T> without reallocation

terhix · June 27, 2016, 5:16pm

Currently it's possible to split slices at index and get the parts, even mutable slices since they don't overlap. Vec itself implements a split_off method that allows the vector to be split at index, it however triggers an allocation of the new vector.

I'm interested in a method that would consume a Vec<T> and produce two owned instances of Vec<T> without the extra allocation step. Given an input vector of (pointer / capacity / length) = (N / 10 / 6), splitting it at offset 3 would produce a pair of (N / 3 / 3) and (N+3 / 7 / 3). Basically splitting the original allocated region between the two.

I'm still kind of new to how heap and stack works, is that even possible?

hanna-kruppe · June 27, 2016, 5:49pm

It is conceivable that this could be made to work, and I've occasionally wished for it as well. But as far as I know, there's currently no way to tell the allocator "Hey I got this piece of memory from you, now please pretend you gave it to me as two separate, contiguous allocations". In fact, this may be fundamentally incompatible with how some allocators work (e.g., if it stores some metadata right next to each allocation). But I wish it were possible on a best-effort basis, I just don't know if any allocators provide an interface to do it.

matklad · June 27, 2016, 6:10pm

From the practical point of view I suspect that spliting a slice would be the right solution in the majority of the cases.

An interesting point is that if you do split a vector into two vectors, then the first one does not have any spare capacity, so any push will case an allocation. If you want to avoid allocations, then you must use the first vector as essentially a slice anyway.

mbrubeck · June 27, 2016, 6:46pm

Since the allocator API doesn't allow deallocating just part of the original allocation, the returned values would need shared ownership of the buffer, using something like Rc. For example (playground):

pub struct SubVec<T> {
    buf: Rc<Vec<T>>,
    range: Range<usize>,
}

impl<T> Deref for SubVec<T> {
    type Target = [T];
    fn deref(&self) -> &[T] { &self.buf[self.range.clone()] }
}

fn split_at<T>(v: Vec<T>, i: usize) -> (SubVec<T>, SubVec<T>) {
    let n = v.len();
    assert!(i < n);
    
    let buf = Rc::new(v);
    (SubVec { buf: buf.clone(), range: 0..i },
     SubVec { buf: buf, range: i..n })
}

This has a few downsides. The contents of the Vec can't be mutated without interior mutability or unsafe code. The Rc<Vec<T>> has an unnecessary extra heap allocation and double pointer indirection. (Rc<[T]> would be a better fit, but I don't think there's any way to get one from a Vec<T> yet.) And of course, reference counting has some runtime overhead. There may be some cases where this is useful, but overall it has little advantage over a borrowed slice.

StefanoD · June 27, 2016, 8:13pm

There are cases where this can be performed very easily in divide and conquer algorithms. You just have to save the indices in variables. But I don't know how well this works on your algorithm.

terhix · June 27, 2016, 9:07pm

Yeah, most of those strings are not going to get mutated anyway, so the capacity being cut to the size isn't a problem.

I might play around with a buffer being held by an Rc and using slices otherwise for now, thanks!

StefanoD · June 27, 2016, 9:45pm

Btw: In many divide and conquer algorithm they hold slices and mutate the buffer in place. So slices and mutatation are no contradiction.

comex · June 28, 2016, 8:31am

Some allocators do that (dlmalloc). High performance allocators tend to dedicate each block of memory requested from the OS to a single allocation size, and tightly pack chunks of that size into the block with no slack space or per-allocation metadata. To determine how much space is being freed in a call to free, they look up the page the pointer belongs to in a global table. This is also incompatible with splitting.

You're probably best off making your own allocator, really. One nice thing is that mmap and its equivalents on other OSes are splittable: you can deallocate individual pages of an mmapped block. So for small allocations you could delegate to normal malloc and just reallocate upon splitting, while for large ones you could track which pages belong to what, preserving O(1). Since the split point might not be on a page boundary, you'd have to reference-count pages, though...

Topic		Replies	Views
Subvector instead of slice help	7	8408	April 3, 2020
Transform(split) Vec into Vec of Vec help	11	4228	February 25, 2020
Weird behaviour of `Vec::from_raw_parts` help	14	227	September 27, 2024
Any Such Thing as an Owned Mutable Slice? help	4	459	September 11, 2020
How can I get multiple disjoint mutable references to a vector while retaining the ability to create immutable references?	8	1040	January 12, 2023

Split owned Vec<T> without reallocation

Related topics