Fast get current time?

I'm writing an algorithm that requires the current time in order to compute durations, and I'm finding that about half of the time for a particular method call is spent getting the current time. I'm wondering if there's a faster way of getting the current time than Instant::now() or SystemTime::now()?

Some considerations:

  • It's OK if the time isn't monotonic - the algorithm only needs time to be monotonic most of the time, so if the clock occasionally moves backwards, it's not a big deal.
  • The solution should ideally be cross-platform (although if I can solve it for a single platform, that's better than nothing).

For reference, on my Mac, I record the following benchmarks:

#[bench]
fn bench_get_monotonic_time(b: &mut Bencher) {
    b.iter(|| Instant::now());
}

#[bench]
fn bench_get_system_time(b: &mut Bencher) {
    b.iter(|| SystemTime::now());
}
test ...::bench_get_monotonic_time ... bench:  19 ns/iter (+/- 2)
test ...::bench_get_system_time    ... bench:  42 ns/iter (+/- 7)

I managed to get it to work for newer Linux kernels using CLOCK_MONOTONIC_COARSE, which is a course-grained but higher-performance monotonic clock. This uses the libc crate from crates.io.

pub fn now_monotonic() -> (i64, i64) {
    let mut time = libc::timespec {
        tv_sec: 0,
        tv_nsec: 0,
    };
    let ret = unsafe { libc::clock_gettime(libc::CLOCK_MONOTONIC_COARSE, &mut time) };
    assert!(ret == 0);
    (time.tv_sec, time.tv_nsec)
}

Some performance numbers (taken from a VM, so take with a grain of salt):

test ...::bench_get_monotonic_time ... bench:     44 ns/iter (+/- 57)
test ...::bench_get_system_time    ... bench:     46 ns/iter (+/- 6)
test ...::bench_now_monotonic      ... bench:      9 ns/iter (+/- 0)

If you're on a VM, you should make sure the paravirtualization mode is set properly so that the time lookup doesn't end up requiring a hypercall.

It doesn't seem to be a problem in practice. In newer versions of the code (although I didn't really change anything, so frankly I'm not sure what's changed), it takes 5 ns/iter. That's a ~9-fold improvement over 44 ns/iter. By comparison, when I run the same benchmarks on a bare-metal Linux laptop, I only get ~5-fold improvement between the CLOCK_MONOTONIC_COARSE and Instant::now() versions. Thus, I'm pretty sure that I've squeezed whatever optimizations I can out of it even on the VM.

Have you tried calling libc::clock_gettime with CLOCK_MONOTONIC instead of CLOCK_MONOTONIC_COARSE? CLOCK_MONOTONIC is what's used in std::time::Instant::now().
Other than that, I can only think of in-lining as the reason.

On Linux, CLOCK_MONOTONIC is significantly slower than CLOCK_MONOTONIC_COARSE (45 ns/iter vs 5 ns/iter in a Linux VM on my Mac). This is consistent with the clock_gettime(3) man page, which says about CLOCK_MONOTONIC_COARSE:

A faster but less precise version of CLOCK_MONOTONIC. Use when you need very fast, but not fine-grained timestamps.

On x86/x86_64, you could also try to use the RDTSCP/RDTSC instructions. There is a crate for that, but I've never used it so I cannot make any recommendation about it.

Oh that's awesome! On Mac, it's ~1/3 faster than the previous fastest.

Here are the results on Mac (bench_now_macos_libc uses mach_absolute_time):

test ...::bench_now_amd64_ticks   ... bench:     11 ns/iter (+/- 1)
test ...::bench_now_default       ... bench:     18 ns/iter (+/- 2)
test ...::bench_now_macos_libc    ... bench:     18 ns/iter (+/- 1)

And in a Linux VM (bench_now_linux_libc uses clock_gettime with CLOCK_MONOTONIC_COARSE):

test ...::bench_now_amd64_ticks   ... bench:     11 ns/iter (+/- 1)
test ...::bench_now_default       ... bench:     43 ns/iter (+/- 20)
test ...::bench_now_linux_libc    ... bench:      5 ns/iter (+/- 1)
1 Like

The Coarsetime crate might be exactly what you are looking for.

3 Likes

Oh that's awesome; thanks!