We have:

```
p1: *mut u8;
p2: *mut u8;
```

p1, p2 are both aligned at multiples of 4096 byte boundaries.

Now, we want to copy 4kb from p2[0 .. 4096] to p1[0 .. 4096].

What is the absolute fastest way to do this ?

We can assume this is x86_64 linux.

EDIT: As it turns out, the 'real' problem is that we have:

```
p1_0, p1_1, p1_2, ... p1_n
p2_0, p2_1, p2_2, ..., p2_n
```

and we want to copy `p1_i <- 4kb from p2_i`

for all `0 <= i < n`

Basically, I want the fastest way to copy a bunch of (not necessarily continuous) 4kb pages.