Transform(split) Vec into Vec of Vec

mihaivo · November 26, 2019, 5:15pm

I am trying to split a Vec into a Vec of Vec.

I was thinking that maybe something like this will work:

    pub fn split_vector<T>(v: Vec<T>, chunk_size : usize, capacity: usize) -> Vec<Vec<T>> {
        let result : Vec<Vec<T>>= v.chunks(chunk_size)
            .map(|x| {
                unsafe { Vec::from_raw_parts(x.as_ptr() as *mut T, chunk_size, capacity) }
            })
            .collect();
        std::mem::forget(v);
        result
    }

    pub fn vector_into_parts<T>(v: Vec<T>, num_parts: usize) -> Vec<Vec<T>> {
        let chunk_size = (v.len() as f32 / num_parts as f32).ceil() as usize;
        
        split_vector(v,chunk_size, chunk_size)
    }

Unfortunately it causes a crash, don't know why.
The underlying problem is very simple: You have a heap zone wrapped into a Vec struct. You want to extract the heap zone and wrap it in Vec and then create a Vec with all of them.

Is there any simpler way of doing this? Possibly without unsafe code? Keep in mind the the initial Vector is HUGE and no copying should ever occur.

Cerber-Ursi · November 26, 2019, 5:26pm

I've tried to run this in Miri - playground. Looks like the problem is simple: underlying malloc/free API doesn't know anything about your manipulations, so, when Rust tries to free the first chunk, system uses the pointer to the original vector and frees it all at once. Then the next chunk is coming to drop - and here it is, use-after-free. If this is the case, I'm afraid you'll have to ensure that the original Vec is recreated before drop - this would be the only safe way.

FenrirWolf · November 26, 2019, 6:26pm

Yeah, allocators don't allow you to split an allocation into separately managed chunks. You can borrow slices from the main Vec, but you can't create Vecs that try to claim split ownership of the memory.

HadrienG · November 26, 2019, 7:58pm

Just to clarify, what is the problem with using the v.chunks(chunk_size) iterator directly ?

tuffy · November 26, 2019, 8:16pm

It sounds like the problem is that the Vec is huge, and making copies of all the iterated chunks is undesirable.

Perhaps an alternative would be to use VecDeque instead and drain it in pieces while reclaiming space in the process. That's about all I can think of.

fn split_vec<T>(v: Vec<T>, chunk_size: usize) -> Vec<Vec<T>> {
    use std::collections::VecDeque;

    let mut v: VecDeque<T> = v.into(); // avoids reallocating when possible

    let mut acc = Vec::new();
    while v.len() > chunk_size {
        acc.push(v.drain(0..chunk_size).collect());
        v.shrink_to_fit();
    }
    acc.push(v.into());
    acc
}

HadrienG · November 26, 2019, 9:31pm

It sounds like the problem is that the Vec is huge, and making copies of all the iterated chunks is undesirable.

Wait. That does not sound right. The chunks iterator of a Vec does not copy anything, it just generates slices (which are implemented as pointer + len pairs, not data copies) targeting successive regions of the Vec that is being iterated across.

tuffy · November 26, 2019, 9:52pm

But if you want to consume the original Vec and turn it into many owned sub-Vecs, collecting all those chunks from the iterator will surely require a copy step.

HadrienG · November 26, 2019, 9:57pm

You cannot turn one Vec into many owned sub-Vecs without copies, indeed. The design of Vec and the underlying memory allocation API will simply not allow for it.

Hence my question: why does the OP need owned sub-Vecs, as opposed to borrowed slices from the original Vec as the chunks iterator provides?

I believe that answering this question is important in order to provide a good answer to the original problem of splitting a Vec without copying its contents.

mihaivo · November 27, 2019, 10:46am

The thing is I want to split the original Vec into partitions, which for various reasons could then be increased in size(added elements). This is why I need Vec<Vec>.

I understand now you cannot turn a Vec into Vec<Vec> without any copying.

The fix I have for my case is instead of getting a big Vec from the input and splitting afterwards, I select the number of partitions first and push with a round robin strategy to the partitions inside the Vec.

Thank you all for your help and sorry for the late response! I will put a TL;DR in the thread for other people to read.

mihaivo · November 27, 2019, 10:56am

TL;DR
It's not possible without copy. Alternatives are:

Use slices if you don't really need to own the memory.
Swap into a VecDequeue which avoids reallocating when possible
When making the original array, instead of pushing in a single Vec, push with a round robin strategy into a Vec<Vec>

Quoted explanations:

tuffy:

It sounds like the problem is that the Vec is huge, and making copies of all the iterated chunks is undesirable.

Perhaps an alternative would be to use VecDeque instead and drain it in pieces while reclaiming space in the process. That's about all I can think of.
fn split_vec<T>(v: Vec<T>, chunk_size: usize) -> Vec<Vec<T>> {
    use std::collections::VecDeque;

    let mut v: VecDeque<T> = v.into(); // avoids reallocating when possible

    let mut acc = Vec::new();
    while v.len() > chunk_size {
        acc.push(v.drain(0..chunk_size).collect());
        v.shrink_to_fit();
    }
    acc.push(v.into());
    acc
}

Thank you all for your explanations!

scottmcm · November 27, 2019, 7:59pm

Note that even if you could turn your Vec<T> into Vec<Vec<T>>, you 'd still need to reallocate to add an element to one of the middle Vecs, since there's clearly no additional size there -- the next memory position is the start of the next Vec's space.

Consider, then, turning your Vec<T> into a Vec<Cow<[T]>> that borrows from the original. That way it doesn't allocate until you actually modify one of the inner containers.

system · February 25, 2020, 7:59pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Conversion between Vec<[T; N]> and Vec<T>? help	25	971	July 29, 2021
Vec layout / transmute help	5	543	June 15, 2022
Split owned Vec<T> without reallocation help	8	4786	January 12, 2023
Clone a part of a Vec<u8>	5	1973	January 12, 2023
How to efficiently split a Vec<u8> into Vec<Vec<u8>> and sort? community	3	1821	April 29, 2020

Transform(split) Vec into Vec of Vec

Related Topics