I'm writing an algorithm that requires the current time in order to compute durations, and I'm finding that about half of the time for a particular method call is spent getting the current time. I'm wondering if there's a faster way of getting the current time than Instant::now() or SystemTime::now()?
Some considerations:
It's OK if the time isn't monotonic - the algorithm only needs time to be monotonic most of the time, so if the clock occasionally moves backwards, it's not a big deal.
The solution should ideally be cross-platform (although if I can solve it for a single platform, that's better than nothing).
For reference, on my Mac, I record the following benchmarks:
I managed to get it to work for newer Linux kernels using CLOCK_MONOTONIC_COARSE, which is a course-grained but higher-performance monotonic clock. This uses the libc crate from crates.io.
pub fn now_monotonic() -> (i64, i64) {
let mut time = libc::timespec {
tv_sec: 0,
tv_nsec: 0,
};
let ret = unsafe { libc::clock_gettime(libc::CLOCK_MONOTONIC_COARSE, &mut time) };
assert!(ret == 0);
(time.tv_sec, time.tv_nsec)
}
Some performance numbers (taken from a VM, so take with a grain of salt):
test ...::bench_get_monotonic_time ... bench: 44 ns/iter (+/- 57)
test ...::bench_get_system_time ... bench: 46 ns/iter (+/- 6)
test ...::bench_now_monotonic ... bench: 9 ns/iter (+/- 0)
It doesn't seem to be a problem in practice. In newer versions of the code (although I didn't really change anything, so frankly I'm not sure what's changed), it takes 5 ns/iter. That's a ~9-fold improvement over 44 ns/iter. By comparison, when I run the same benchmarks on a bare-metal Linux laptop, I only get ~5-fold improvement between the CLOCK_MONOTONIC_COARSE and Instant::now() versions. Thus, I'm pretty sure that I've squeezed whatever optimizations I can out of it even on the VM.
Have you tried calling libc::clock_gettime with CLOCK_MONOTONIC instead of CLOCK_MONOTONIC_COARSE? CLOCK_MONOTONIC is what's used in std::time::Instant::now().
Other than that, I can only think of in-lining as the reason.
On Linux, CLOCK_MONOTONIC is significantly slower than CLOCK_MONOTONIC_COARSE (45 ns/iter vs 5 ns/iter in a Linux VM on my Mac). This is consistent with the clock_gettime(3) man page, which says about CLOCK_MONOTONIC_COARSE:
A faster but less precise version of CLOCK_MONOTONIC. Use when you need very fast, but not fine-grained timestamps.
On x86/x86_64, you could also try to use the RDTSCP/RDTSC instructions. There is a crate for that, but I've never used it so I cannot make any recommendation about it.