How to make an efficient copy of bytes::Buf?

I have some struct that implements the Buf trait from bytes crate. I need to make a copy of it. that is I have two absolutely different parts of the app, each of the parts needs to read the Buf. Key points that affect the design:

  1. I cannot use some other crate
  2. I want to make as few new heap allocations as possible
    So far, I have only managed to come up with this:
pub fn copy_buf(buf: &mut impl Buf) -> (Box<dyn Buf>, Box<dyn Buf>) {
    // TODO: Chain<Bytes, Box<dyn Buf>> is not really efficient - maybe implement on VecDequeue<Bytes>?

    if buf.remaining() == buf.chunk().len() {
        let bytes = buf.copy_to_bytes(buf.remaining());
        return (Box::new(bytes.clone()), Box::new(bytes));
    }

    let mut byte_vec = Vec::new();

    // here we hope that each chunk is backed by Bytes struct, which can make an efficient copy of itself via reference counting
    while buf.has_remaining() {
        let chunk_len = buf.chunk().len();

        let chunk_bytes = buf.copy_to_bytes(chunk_len);

        byte_vec.push(chunk_bytes);
    }

    return (chain_byte_vec(&byte_vec), chain_byte_vec(&byte_vec));
}

fn chain_byte_vec(byte_vec: &Vec<Bytes>) -> Box<dyn Buf> {
    if byte_vec.len() == 0 {
        return Box::new(Bytes::new());
    }

    if byte_vec.len() == 1 {
        return Box::new(byte_vec[0].clone().chain(Box::new(Bytes::new())));
    }

    let mut current: Box<dyn Buf> = Box::new(byte_vec.last().unwrap().clone());
    for bytes in byte_vec.iter().rev().skip(1) {
        current = Box::new(bytes.clone().chain(current));
    }

    return current;
}

I don't like this for following reasons, but I'm not exactly sure how to do it better

  1. This copy operation is essentially destructive, meaning that it will consume the source buf, which might not be desirable. For example, what if the source Buf is implemented by some custom struct that is a part of another struct like MyContainer { my_buf: MySpecialCustomBuf } ?
  2. too many heap allocations anyway.
  3. Now reading a byte of the chain effectively becomes O(n) operation where n is the number of elements in the chain

As for 2) and 3), this could be resolved by implementing some wrapper around VecDeque<Bytes> and implementing Buf for it, as for 1) - I'm not sure if is possible to avoid it
Overall, this all looks too complicated, so I might be missing something simple.

What about this? It's O(1) if buf is of type Bytes.

pub fn copy_buf(buf: &mut impl Buf) -> (Bytes, Bytes) {
    let bytes = buf.copy_to_bytes(buf.remaining());
    let bytes2 = bytes.clone();
    (bytes, bytes2)
}

In the code that I put the first few lines are already doing this. The catch is to avoid unnecessary bytes copies, if the incoming Buf contains multiple chunks. Or is it considered to be a rare case?

Measure, I guess? Intuitively, I'd expect copy to a contiguous array and then working on that would be more efficient up to sizes around megabytes, but just going through dyn Buf is probably more expensive than anything to do with how many chunks there are....