Transform(split) Vec into Vec of Vec

I am trying to split a Vec into a Vec of Vec.

I was thinking that maybe something like this will work:

    pub fn split_vector<T>(v: Vec<T>, chunk_size : usize, capacity: usize) -> Vec<Vec<T>> {
        let result : Vec<Vec<T>>= v.chunks(chunk_size)
            .map(|x| {
                unsafe { Vec::from_raw_parts(x.as_ptr() as *mut T, chunk_size, capacity) }
            })
            .collect();
        std::mem::forget(v);
        result
    }

    pub fn vector_into_parts<T>(v: Vec<T>, num_parts: usize) -> Vec<Vec<T>> {
        let chunk_size = (v.len() as f32 / num_parts as f32).ceil() as usize;
        
        split_vector(v,chunk_size, chunk_size)
    }

Unfortunately it causes a crash, don't know why.
The underlying problem is very simple: You have a heap zone wrapped into a Vec struct. You want to extract the heap zone and wrap it in Vec and then create a Vec with all of them.

Is there any simpler way of doing this? Possibly without unsafe code? Keep in mind the the initial Vector is HUGE and no copying should ever occur.

I've tried to run this in Miri - playground. Looks like the problem is simple: underlying malloc/free API doesn't know anything about your manipulations, so, when Rust tries to free the first chunk, system uses the pointer to the original vector and frees it all at once. Then the next chunk is coming to drop - and here it is, use-after-free. If this is the case, I'm afraid you'll have to ensure that the original Vec is recreated before drop - this would be the only safe way.

1 Like

Yeah, allocators don't allow you to split an allocation into separately managed chunks. You can borrow slices from the main Vec, but you can't create Vecs that try to claim split ownership of the memory.

3 Likes

Just to clarify, what is the problem with using the v.chunks(chunk_size) iterator directly ?

It sounds like the problem is that the Vec is huge, and making copies of all the iterated chunks is undesirable.

Perhaps an alternative would be to use VecDeque instead and drain it in pieces while reclaiming space in the process. That's about all I can think of.

fn split_vec<T>(v: Vec<T>, chunk_size: usize) -> Vec<Vec<T>> {
    use std::collections::VecDeque;

    let mut v: VecDeque<T> = v.into(); // avoids reallocating when possible

    let mut acc = Vec::new();
    while v.len() > chunk_size {
        acc.push(v.drain(0..chunk_size).collect());
        v.shrink_to_fit();
    }
    acc.push(v.into());
    acc
}
2 Likes

It sounds like the problem is that the Vec is huge, and making copies of all the iterated chunks is undesirable.

Wait. That does not sound right. The chunks iterator of a Vec does not copy anything, it just generates slices (which are implemented as pointer + len pairs, not data copies) targeting successive regions of the Vec that is being iterated across.

But if you want to consume the original Vec and turn it into many owned sub-Vecs, collecting all those chunks from the iterator will surely require a copy step.

You cannot turn one Vec into many owned sub-Vecs without copies, indeed. The design of Vec and the underlying memory allocation API will simply not allow for it.

Hence my question: why does the OP need owned sub-Vecs, as opposed to borrowed slices from the original Vec as the chunks iterator provides?

I believe that answering this question is important in order to provide a good answer to the original problem of splitting a Vec without copying its contents.

4 Likes

The thing is I want to split the original Vec into partitions, which for various reasons could then be increased in size(added elements). This is why I need Vec<Vec>.

I understand now you cannot turn a Vec into Vec<Vec> without any copying.

The fix I have for my case is instead of getting a big Vec from the input and splitting afterwards, I select the number of partitions first and push with a round robin strategy to the partitions inside the Vec.

Thank you all for your help and sorry for the late response! I will put a TL;DR in the thread for other people to read.

1 Like

TL;DR
It's not possible without copy. Alternatives are:

  1. Use slices if you don't really need to own the memory.
  2. Swap into a VecDequeue which avoids reallocating when possible
  3. When making the original array, instead of pushing in a single Vec, push with a round robin strategy into a Vec<Vec>

Quoted explanations:

Thank you all for your explanations!

2 Likes

Note that even if you could turn your Vec<T> into Vec<Vec<T>>, you 'd still need to reallocate to add an element to one of the middle Vecs, since there's clearly no additional size there -- the next memory position is the start of the next Vec's space.

Consider, then, turning your Vec<T> into a Vec<Cow<[T]>> that borrows from the original. That way it doesn't allocate until you actually modify one of the inner containers.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.