As has already been mentioned, memcpy() is normally the fastest method.
If you have control over one of the pointers you could always play tricks with virtual memory and your address space to map p1 and p2 to the same physical memory. That way whenever p1 is first initialized your other buffer will see the changes instantly.
@Michael-F-Bryan : Yeah, I was hoping for non-memcpy tricks. However, it's probably not worth it to do a mmap call for just one pair of 4kb pages right? This seems only applicable in cases when we have huge continuous blocks.
Yeah, tricks like mmap() and DMA might work to improve the transfer of large chunks of data but for only a page or two it's probably faster to do the memcpy() instead of switching into kernel space.