Hi, as I'm sitting here, a few TB of data (some large files, but also a butt-load of tiny files) are slowly (we hope it will be done in next few days) being pumped to another (MS-Win, sadly) fileserver (so that we can even begin to hope to migrate the rest of the server over the weekend).
Besides Win/NTFS being terrible with lots-of-small-files, I expect latency also plays some part in this (create/open file -- ok -- write this -- ok -- write that -- ok -- ... -- close file -- ok -- <rinse&repeat>). Suggestions to run multiple multi-threaded robocopy jobs (if you can subdivide the files somehow, or are lucky enough to already have even-enough split in sub-dirs) seem to confirm my suspicions.
So I started thinking:
what if we side-stepped the windows SMB protocol
let's use some fast messaging layer, so as not to bother with TCP or UDP low-level drudgery
scan the directories in advance / in another thread, to divide files into large / medium / tiny categories
large files will be tranferred in multiple chunks
medium (maximum size to be determined by benchmarking) files will be transferred in one chunk / message
tiny files will be combined into one multi-file message
file attributes / permissions are sent in the message header
listening process on the destination side disassembles messages, writing out tiny / medium files, and facilitates correct assembly / write order of large file fragments
The fastest way may be to physically move a drive from one computer to another.
You're unlikely to see any benefit from trying to replace TCP. It's suboptimal in high-latency networks with high packet loss, but over a local network these problems are not relevant.
File access on Windows is surprisingly slow, so having a client there that has weird workarounds like a thread pool for closing file handles will help.
Apart from that, no cleverness is needed. You could probably just install rsync and use that. It supports chunking, compression, extended attributes.
And the metadata message parsing, pipelined send, and disassembly is quite easy to implement:
The fastest way may be to physically move a drive from one computer to another.
Single drive - perhaps. RAID array, that still holds a whole lotta of other volumes - nope.
For whole volume, a disk image / network cloning would probably be fast enough.
But sadly, we need to copy just one directory / share out of a humongous volume / drive.
Wut ??? tar supports windows ACLs ? I thought about using tar, but found it so unlikely it would support this that I haven't really check ... It would still be single-threaded & totally un-resilient (one failure => start over), but it would be a step up