Rust in low latency environments

Suppose we have the following requirements:

  1. hardware is limited to $3k, things purchased new from Amazon.com

  2. machine must run x86_64 linux

  3. storage must be to non-volatile memory

  4. storage must be >= 10 TB in total

  5. We want to design a KV store supporting two functions:
    set(u64, Box<[u8; 1024*1024]>)
    get(u64) -> Option<Box<[u8; 1024 * 1024]>>
    all keys are u64; all values are 1MB

  6. Assume no hardware failures (except for pulling of power cord). Assume that after a sync, data is safely on disk.

  7. We are storing >= 10_000_000 kv pairs (follows from 4 & 5).

  8. We care about both latency and throughput.

What, if any, Rust crates would be useful in building a solution to this problem ?

This doesn't directly answer your question about Rust crates, but here are my thoughts..

I'm assuming you're trying to squeeze as much utilization out of each machine as possible? I'm not sure you'll get a great answer unless you're able to benchmark -- there's several places I can think of where you'll find bottlenecks:

  • Storage capacity: it may turn out you overspecced the rest of the machine and run out of storage before you hit any other bottleneck.
  • IO throughput
  • IO latency
  • Network throughput
  • OS driver inefficiencies (esp as it relates to interrupt handling for networking and storage)

Regarding latency, I don't really expect you'd have many issues with CPU scheduling latency, since there's not much CPU involved in your description..

Will you have many concurrent requests though? Then I'd expect you'd want multiple IO threads depending on how many CPUs your IO is serviceable across.. which probably depends on the topology of the storage you attach.

I agree there is a lot here I don't know that there are lots of moving parts, but I also think, despite these moving parts, we can derive something close to "optimal solution" for the following reasons:

If we look at this in terms of "fastest algorithm" and "known lower bounds" and the gap between them, outside of disk caches, they're pretty close / tight. For example:

fastest algorithm for get:

  • do 1 seek to right location
  • read 1 MB

known lower bound for get:

  • outside of disk caches: have to read entire 1 MB

fastest algorithm for set:

  • do 1 seek to right location
  • write 1 MB

known lower bound for set:

  • even if these are fast disk caches, sooner or later they have to get written to disk, so amortized, we have to write 1 MB anyway

In this sense, it seems that with some diligent engineering, we should be able to engineer a system that is pretty darn close to known-lower-bound.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.