Is std::net::udp suitable for high speed data transferring?

I'm writing a program which dumps UDP packages from a 10GbE network card. I want to reach a speed of about 1GBytes/s. The size of the UDP package is around 8kBytes. The data dumped will be written to a RAM disk.

So my question is : Is std::net:udp suitable for this purpose? or is there any better solution?

Thanks.

2 Likes

It generally takes some work to get to those speeds. I'm not very up-to-date with the libraries in this space, but one example of one is http://udt.sourceforge.net/. A few years ago I wrote rust binding for UDT: https://github.com/eminence/udt-rs. If you're interested, we can work to get them updated and tested with the latest version of rust.

I'm not aware of any pure-rust high-speed UDP libraries, but if someone wanted to work on this, it would be a great addition to the ecosystem :slight_smile:

2 Likes

Hi, and thank you for your reply.

I have roughly studied your udt lib, but not sure whether my condition is suitable to this lib.

I'm receiving the data from data acquisition device, which I cannot modify its inner program (actuall its is based on FPGA).

I find that UDT is a connection oriented lib, but my DAQ device does not perform any connection, it simply send udp package to some destination address in some certain port.

I have tried to use std::net::UdpSocket to receive the data but about 10% of the package will be lost.

Finally, I used pcap to capture packet and reached desired speed.

I have tried to use std::ned::UdpSocket, and find that 10% of the packets are lost.

How could they differ so much?

2 Likes

Can you show us your code which uses std::ned::UdpSocket? I suspect that issue is that UdpSocket does one syscall for each packet, which is less efficient than whatever libpcap does under the hood.

Also try to test mio, being low-level wrapper around epoll (if you are on Linux) it will process several packets per syscall.

Hi, following is my UdpSocket code:

let socket = UdpSocket::bind("0.0.0.0:60000").unwrap();
socket.set_nonblocking(false).unwrap();
let niter=4096;
let mut buf = vec![0_u8;16384*niter];
let mut shift=0_usize;
for _i in 0..niter{
    let (num_bytes, _src_addr) = socket.recv_from(&mut buf[shift..]).unwrap();
    shift+=num_bytes;
    // no other code here
}
///then check the packet received to find out how many are lost here

as a comparison, following is my pcap code:

let dev = pcap::Device {
        name: dev_name.to_string(),
        desc: None,
};

let mut cap = Capture::from_device(dev)
    .unwrap()
    .timeout(1000000000)
    .buffer_size(512 * 1024 * 1024)
    .open()
    .unwrap();
cap.filter(&format!("dst port {}", port)).unwrap();
while let Ok(packet) = cap.next() {
    let data: &[u8] = &packet.data[42..];//skip header
    // many other operations including copy to a temp buffer and use a
    // crossbeam-channel to send the buffer to another thread etc.
    // I skip it here.
}

ps, I find that pnet is also worse than pcap. If I set a rather big buffer (4GBytes) for pnet, it will lose 1% of the packets, while pcap lose if any <0.001% packets.

Outside of the Linux specific recvmmsg and sendmmsg it is not possible to receive or transmit multiple UDP datagrams with one syscall. If you are writing portable code you will always get only one datagram per recv call.

What is the size of the UDP datagrams being sent? Is the link a standard 1500 MTU ethernet link or are Jumbo frames being used?

It is possible that some of the loss is due to overheads in the IP/UDP stack in the kernel and limitations of the UDP socket syscall interface, also your UDP socket will have the default buffer size while you have increased the pcap buffer size.

1 Like

My packet has a total length of 8962 bytes (as a ethernet packet), and my 10GBE card was set to MTU=9000.

It’s likely the “stock” kernel IP/UDP stack cannot keep up with the packet ingress rate if you’re pulling a packet at a time (ethtool should have stats on the types of drops, including ones on the NIC); the way UdpSocket is set up is it’ll be copying your jumbo frames from kernel buffers to your buffer. Instead, you’d probably want to use AF_PACKET and work with the raw packets coming off the interface. You can then create a setup whereby the kernel and your program share a buffer, and the kernel will tell you when it has filled up the buffer with packets; when that happens, you’ll want to copy them out to a background thread that does file I/O (possibly using AIO to speed up that path).

But this is a custom setup that is beyond what UdpSocket provides.

2 Likes

What if SO_REUSEPORT it and receive datagrams from multiple processes/threads?

Finally, I find that the packet dropping ratio can differ significantly whether or not bind the thread to the same numa node CPU as the NIC.

Alas, the old good time when we can treat the computer as a black box abstracted as CPU+RAM has passed away.