I have a contiguous buffer of bytes like the following:
[x1 y1 x2 y2 x3 y3 ...]
and I need to split them up into separate buffers
[x1 x2 x3 ...] and [y1 y2 y3 ...]
where the two output buffers are not next to each other.
Are there any methods that can make this process fast, maybe using SIMD (although there is no actual computation here, just shuffling) or the DMA (something like memcpy)?
EDIT (for old question) I should say that I actually want to do the opposite - I want to de-interleve.
EDIT2: I am re-writing the question to make it clearer