Using a BytesMut and a Rayon par iter

So far, I have managed to use unsafe code to accomplish what I need (I need each loop of the iter to write to a specific index of a BytesMut):

    let port_range= 2 as usize;
    let sel_array = &[10, 20];
    let mut my_array = &mut [0,3,3,45,6,8];

    let mut ret = BytesMut::with_capacity(3);
    // Must use a Unique in order to send ptr between threads
    let ptr = unsafe {Unique::new_unchecked(ret.as_mut_ptr())};

    let my_ret: Vec<()> = my_array.as_parallel_slice_mut().par_chunks_mut(port_range).enumerate().map(move |(idx, arr)| unsafe {
        let get_idx = idx % port_range;
        (*ptr.as_ptr().offset(idx as isize)) = arr[0] + arr[1] + sel_array[get_idx];
    }).collect();

    // Increment the cursor by the expected number of bytes written to account for offset
    unsafe { ret.advance_mut((my_array.len() / port_range) as usize);}

    for r in ret {
        println!("val: {}", r);
    }

My question is: supposing NOTHING else other than the par_iter would modify the inputted BytesMut, would this method be reliable/safe-enough?

Thanks

Also: How does the system know at runtime that a series of memory addresses belongs to a pointer? Does the allocation algorithm "bind" a Layout size to an initial pointer capable of a certain offset which has a maximum of the Layout's size?

How does the system know at runtime that a series of memory addresses belongs to a pointer? Does the allocation algorithm "bind" a Layout size to an initial pointer capable of a certain offset which has a maximum of the Layout's size?

What brings you to the conclusion that it knows this information? I don't think it does...

When you allocate, you input a Layout, and get in return a *mut u8. See std::alloc. This is where I get the idea from. But, I've never taken low-level theory so my inferences are based on pretty small subsets of surface-leveled information

Just out of curiosity, have you written benchmarks to compare this unsafe version against the more naive version of using a for loop? If you're just iterating over an array and doing a bit of math before writing the result to a buffer, you may find that the single-threaded version is faster (due to parallelism overheads, caching effects, the prefetcher, etc) for your use case.

This is an alternative implementation using no unsafe. The only difference is we collect into a Vec instead of getting each thread to write the result to the same buffer.

fn safe_version() {
    let port_range = 2 as usize;
    let sel_array = &[10, 20];
    let mut my_array = &mut [0, 3, 3, 45, 6, 8];

    let ret: Vec<_> = my_array
        .as_parallel_slice_mut()
        .par_chunks_mut(port_range)
        .enumerate()
        .map(move |(idx, arr)| {
            let get_idx = idx % port_range;
            arr[0] + arr[1] + sel_array[get_idx]
        })
        .collect();

    println!("Safe version: {:?}", ret);
}

(playground)

Okay, that is good to know!

Actually, there is a junction before these steps that determines when to use the parallel version or the serial version :slight_smile: . Already ontop of that part!