The state of time in Rust: leaps and bounds

In this topic, I will try to review the current landscape of Rust libraries for accounting dates and times. In particular, I will touch on the thorny issues concerning support for time standards and UTC leap seconds.

The problem space

There are several different quantities used by software to represent dates and times, each with their own properties and complexities.

A timestamp measures the linear count of seconds since an agreed-upon moment, usually the Unix epoch (1970-01-01T00:00:00Z). For efficiency, arithmetics involving durations in uniform time units are often performed over timestamps rather than broken-down calendar time values.

A date and time in its simplest form refers to a date and a time of day. A date-time value has real-world meaning when it is associated with a calendar and a relationship to a standard time scale usually formulated as a fixed offset. This article will only consider the proleptic Gregorian calendar as it is the calendar that most software applications solely need to deal with, as well as the only calendar designated for worldwide communication by ISO 8601 and various Internet standards. Similarly, most applications deal with times offset from UTC, but there are complications with this time standard which are detailed below.

A civil date and time is date and time designated by authorities governing time zones. A time zone is normally defined by the principal offset from UTC and possibly a daylight saving time (DST) offset applied on a certain schedule within each year. The history of changes in the time zone definition needs to be considered for local calendar times that refer to the past. Crucially, this is the legal definition of what a local date and time e.g. "February 24, 12:34 in Helsinki" refers to.

Time standards and leap seconds

There are several internationally adopted standards for measurement of time, of which most practical interest is in UTC and TAI.

International Atomic Time (TAI) is a standard realized by tracking atomic clocks at metrological institutions around the world. It is a continuous time scale not connected to Earth's rotation (though realized in Earth's reference frame, if you're into relativistic effects).

Universal Coordinated Time (UTC) is derived from TAI to more accurately correspond to solar days. UTC is kept at an offset by a whole number of seconds from TAI. Due to irregularities in Earth's rotation, leap seconds are occasionally introduced by the IERS to increment the offset by one second (or decrement it, but negative leap seconds have not been needed yet) by extending the minute 23:59 of a chosen calendar day to last 61 seconds (or shortening it to 59 seconds, in case of a negative leap second).

The existence of leap seconds means that the mapping between UTC and timestamps on a continuous, physically representative time scale cannot be computed with simple logico-arithmetical formulas and must involve a table of the published leap seconds. Furthermore, this mapping changes as information about newly introduced leap seconds is disseminated. As a consequence of this, when Alice and Bob must agree on which timestamp a future UTC time refers to or vice versa, their tables of leap seconds must be synchronized.

To round up this introduction, it's worth mentioning that it has been decided at the General Conference on Weights and Measures to sunset leap seconds (pardon my astronomical joke here) by 2035.

State of the art in software outside Rust

Unsurprisingly, the kind of intricacies needed to properly address the UTC discontinuity events that have only occurred 27 times since 1972 meets pushback from the software industry. Most widely used programming environments do not support leap seconds, performing calculations over calendar time as if the leap seconds did not and will not exist at all (by implication, the solution to a negative leap second is to insert a notional missing second). This permits simple date and time computations on the proleptic Gregorian calendar, with units up to the week being uniform multiples of the second.

Java's JSR-310 API is the closest one I've come across to provide some acknowlegement to leap seconds and admit their data model cannot fully represent UTC. It stops short of mandating much specific behavior, though:

Implementations of the Java time-scale using the JSR-310 API are not required to provide any clock that is sub-second accurate, or that progresses monotonically or smoothly. Implementations are therefore not required to actually perform the UTC-SLS slew or to otherwise be aware of leap seconds. JSR-310 does, however, require that implementations must document the approach they use when defining a clock representing the current instant.

ECMAScript's in-progress TC39 proposal on Temporal has this note:

Although Temporal does not deal with leap seconds, dates coming from other software may have a second value of 60. In the default 'constrain' mode and when parsing an ISO 8601 string, this will be converted to 59. In 'reject' mode, this function will throw, so if you have to interoperate with times that may contain leap seconds, don't use reject.

Among protocols used to exchange time data, text formats such as RFC 3339 tolerate leap seconds, or, in fact, any time specification referring to 23:59:60 in UTC, with only vague language about possible validation. On the other hand, specifications of some binary formats, notably the Timestamp well-known message type for Protobuf, expressly forbid accounting of leap seconds in their interpretation.

Unix real-time clock: is worse really better?

POSIX timestamps represent the number of seconds since the Unix epoch. Unix standard library functions converting between timestamps and broken-down calendar time ignore leap seconds.

For a variety of mostly historical reasons, leap seconds have been solved on Unix systems by setting the wall-clock time back by a second. The wall clock is generally allowed by POSIX to make discontinuous jumps in either direction, so this is not a new concern; Unix programs using system wall-clock timestamps to measure elapsed time should be prepared to gracefully handle the non-monotonic clock. However, if an application expects the clock to be synchronized via NTP, leap seconds become "dangerous time" when the system clock abruptly deviates from UTC by up to a second, yet there is no way to get information about this through standard APIs. Linux provides the OS-specific adjtimex system call, which works if the time synchronization daemon supports the feature and is able to get the information on the upcoming leap second in time.

NTP and leap second smear

NTP, the time synchronization protocol predominantly used on the internet, supports information on leap seconds. However, due to the issues with the system clock APIs described above, companies maintaining large computing infrastructures - Google, Amazon, Meta among them - have implemented various leap second smear schemes, where a leap second is absorbed by gradual clock adjustments over some time interval around the discontinuity in actual UTC, leveling back with it at the end of the smear interval. For systems using smeared NTP sources (which all must agree on a particular scheme), the clock will run slow by the order of 10-100 ppm during that interval, which is within accuracy tolerance of hardware clocks used in most computing devices and NTP's acceptable slew rate.

Date and time APIs in Rust

std::time

The standard library provides only minimal and opaque facilities to quantify time and access system-provided clocks. This is quite on purpose, as the details of implementation vary with the target operating system, while the subject matter of various time scales and calendar manipulation appears too complex and opinionated to have a single standard API that would suit all users. std::time::Instant provides a monotonic (but not necessarily steady) clock, while std::time::SystemTime exposes system's real-time clock and allows to obtain the time difference elapsed since another system clock time, including the value representing the Unix epoch. No details are provided on the accuracy of the clock, or on dealing with leap seconds besides specifying that SystemTime does not count them.

Third-party crates

As is often the case, crates developed by the community step in to provide functionality missing in the standard library, albeit with their own shortcomings and peculiarities. For this review, I have looked at the following popular crates, listed here with TL;DR summaries of their functionality:

  • time - simple facilities for date-time manipulation with numeric timezone offsets from UTC. No support for leap seconds, besides some allowances in input string formats.
  • chrono - more complex API on dates and times with extensible abstraction to support time zones, and quirky, over-permissive support for leap second inputs.
  • hifitime - precise measurement of time based on TAI, designed for scientific and engineering applications. UTC leap seconds are supported as fully as possible (with issues discussed below).

Solutions to specific problems

In this section, I will get into details of how the Rust crates implement specific tasks that occur with processing date and time information.

Precision

All of the libraries reviewed here use time and duration values with fixed precision down to nanoseconds.

Range of dates

time can represent (with any valid timezone offset) dates in the ±9999 year range. Enabling the large-dates feature extends the year range to ±999999 (changing behavior with compile-time features is a footgun, but let's not digress here).

chrono supports the date range of about ±262000 years from the Common Epoch.

hifitime can represent years in the range of about ±3276800 around the reference epoch of 1900-01-01 12:00 TAI. So if you code the software of a space probe in Rust using this library, it will still be ticking correctly when it reaches another star system.

Arithmetics with numeric durations

The simplest and commonly occurring kind of computations are finding the duration in seconds elapsed between two time references, and adding or subtracting a duration from a time reference to get another one in the same domain.

Duration arithmetics in time operate on UTC timestamps ignoring leap seconds (as of version 0.3.34). This is not suitable for applications that require precise calculations on approximated physical time, but is sufficient for many other purposes. The calculations should be interoperable with the standard library's SystemTime on most platforms. Instant is also provided for dealing with monotonic clock readings and wraps std::time::Instant.

Same kind of arithmetics are available in chrono, but with a twist: the broken-down representation of time allows leap seconds. In fact, it allows constructing any time value with an interstitial second after 59, because it might be a leap second and the core library has no way to verify it. The motto of its unorthodox approach to leap second handling is "it allows for leap seconds but behaves as if there are no other leap seconds". What we get out of this is kinky math that is non-associative and, so long as there are binary operations that may operate on two notional leap seconds, just plain wrong:

use chrono::{DateTime, TimeDelta};
let t1 = DateTime::parse_from_rfc3339("2024-02-24T12:34:59.5Z").unwrap();
let t2 = DateTime::parse_from_rfc3339("2024-02-24T12:34:60.5Z").unwrap();
assert_ne!(t1, t2);
let delta = TimeDelta::milliseconds(500);
assert_eq!(t1 + delta, t2 + delta);
let t3 = DateTime::parse_from_rfc3339("2024-02-24T12:35:60.5Z").unwrap();
assert_eq!(t3 - t2, TimeDelta::seconds(61));

With this kind of laxity, my advice is to not use chrono in applications that can't operate on the "garbage in, garbage out" principle, or at least apply external validation so that leap second values cannot occur. Another concern is performance: there is CPU overhead in checking for special cases that only legitimately occur in 27 seconds over the last 50+ years.

hifitime operates on TAI timestamps, so this seems to be a good fit for scientific or engineering applications that need the continuous time scale. Be aware, though, that the machine's system clock may not be good enough as an input.

Mapping to UTC

To discuss the mathematical properties of how fixed-precision values represent the abstract UTC time scale in this section, let's consider the set of discrete time moments in UTC rounded to whole nanoseconds and bound by the supported date range of the library in question.

In time, the mapping of OffsetDateTime to UTC is not surjective: times within leap seconds do not have a representation.

In chrono any valid UTC time can be represented, but, as shown above, any values of DateTime within the second number 60 are permitted (internally represented as a supernumerary billion of nanoseconds added to second 59), not only valid leap seconds. The data domain is non-linear and does not have a well-defined mapping to UTC for all values.

hifitime gets the bijective mapping to UTC right, with the proviso of how up to date the internally used list of leap seconds is. There is an extension trait to provide this list from an external source (thus shifting the responsibility, not solving the problem), but it is not used for most convenient conversions and string formatting. So to keep up with the latest leap seconds published by the IERS, applications are supposed to regularly update to the latest version of the crate.

Importing leap second times

Date and time references may be exchanged in a format permitting the second value of 60, such as RFC 3339 or RFC 2822.

time accepts such formatted strings when parsing from these formats, with the restriction that the time in UTC must refer to the last minute of the last day of a calendar month (which follows the practice that has always been used by the IERS). The time is then converted to 23:59:59.999999999 in UTC, that is, the last nanosecond preceding the notional leap second.

chrono accepts and internally represents times with second 60, regardless of any other components of the date-time value, as shown in the example above. Again, beware of using this library to work with untrusted inputs.

hifitime correctly validates UTC leap second times for the hardcoded list of latest known leap seconds, and invalidates times with the second number of 60 that are not found on the list. See the previous section for how this can still be incorrect.

Time zone offsets

A simpler level of support for the world's time zones is to operate with numeric offsets from UTC. If the date-time value carries information on the offset, and the leap seconds are ignored, arithmetics can be performed in a simple way on the proleptic Gregorian calendar without changing the offset.

The OffsetDateTime type in time features the numeric offset from UTC in the range ±25:59:59. A fallible conversion to a different offset is provided.

Time zone support in chrono's DateTime type is realized generically, with in-crate implementations for UTC itself (at zero cost upon monomorphization) and fixed offsets from UTC. There are conversions to these well-known time zone parameterizations as well.

The developers of hifitime have little concern for such earthly matters as local time, so there is only minimal support for UTC offsets in input or output.

Obtaining local time zone offset

The operating system provides information on the local time offset currently in effect, typically as a function retrieving the local time. On Unix, this is usually done with the localtime_r standard C library function, but it's generally unsound in Rust due to unguarded use of the process' environment variables.

time provides methods to obtain the system's UTC offset if the local-offset feature is enabled. There is also an unsafe function, heavily discouraged against being used, to allow unsoundness in obtaining the local clock, which enables unconditional use of localtime_r on Unix. Unfortunately, without the unsoundness the library gracefully fails to retrieve the local offset on most Unix platforms, including Linux. This is only avoided if the implementation detects that the process only runs the single thread, which on Linux is done by reading /proc/self/stat. In earlier versions of the crate, the unsoundness was not opt-in, which earned it a security advisory.

chrono used to take the same unsound approach. After getting slapped with a security advisory of its own, this was changed to a pure Rust implementation reading the TZ environment variable and working with data from the system timezone database.

hifitime has no support for local time, period.

Civil time zones

Time zones are the trickiest subject in date and time manipulation. First, the definitions of time zones are subject to change; IANA maintains a not rarely updated database on the world's time zones, which is integrated by software platform vendors. Second, civil date and time computations have more surprising failure cases: a periodic job that is scheduled to run every night at 03:01 Helsinki time will not run on March 31, 2024 due to the DST transition.

time does not currently support time zone information. Jacob Pratt, the principal developer, has stated his intent to add support for tzdata at some future date. A complementary time-tz crate provides time zone conversions and compiled-in time zone data, deriving some of API and the problem of keeping up to date from chrono and chrono-tz.

chrono has both generic API for time zones and fallible methods for civil date manipulation, such as adding a number of days to a DateTime. chrono-tz is a companion crate to chrono that provides civil time zone data from the IANA database as compiled-in constants. Updates in the time zones are meant to be applied by updating the crate dependency to the newest release and rebuilding the application, which can be considered an unsatisfactory method of keeping up with legislative activities of more than a hundred governing bodies around the world. How do you even semver that? The tzfile crate provides integration of timezone database data from /usr/share/zoneinfo, but this functionality is only available on some Unix platforms, most importantly Linux and MacOS.

The developers of hifitime, quite possibly, scoff at the messy humanity at large, seeing how we are unable to divide our geoid into regularly shaped time zones and how we resort to playing tricks with daylight time in a pathetic attempt to improve our productivity and energy conservation.

Conclusion

From this review, it should be evident that none of the crates cover all common use cases without significant unresolved issues. time is sufficient for numeric date-time data based on UTC (sans the leap seconds) and validation of formatted inputs, but its support for reading local time without the unsoundness hack leaves a lot to be desired. chrono deserves praise for its implementations of local time and time zone manipulation, but its data domain is too loose for applications conscious about correctness and security. hifitime is the best fit for scientific and engineering applications which need a precise and continuous time scale, but is not designed for dealing with local time.

30 Likes

Excellent overview! It's clear that you have done your research and delved into the code itself. I don't see any errors regarding the time crate, with the only (inconsequential) misstatement being that IERS mandates leap seconds at the end of a month, rather than it merely happening to always be the case.

I am happy to state that tzdb integration into time is coming along, albeit slowly. I do intend on further limiting handling of leap seconds to those known to have occurred (for instances prior to an "expiration" value).

13 Likes

This was a great read. Bravo!

2 Likes

Thanks for your response and the correction Jacob!

How do you intend to solve the up-to-dateness problem? I think integrating with the system-installed tzdata is the best a library can do on this, at least it moves the responsibility to where it can be managed more easily. But it's not available on Windows, wasm, no_std environments to name a few, so there has to be a fallback to the compiled-in data. Also, reading and parsing a file under the hood seems like a lot to do on every RFC3339 string parse.

I think this responsibility and the associated state-keeping should be made explicit to the library user by abstracting over a leap second provider trait like hifitime partially does. Same for the time zones, for which chrono and time-tz provide examples of a TimeZone trait.

Belated kudos to @BurntSushi for providing much exposition on leap second support in other programming environments in this Reddit thread.

3 Likes

Couldn't you load the data once and cache it? If you are worried about memory usage you could have a LRU cache of timezone definitions.

You could, but a long-running program would not receive changes made since the data was cached. I feel that the choice on how to manage this state should be left to the library user.

That is a good point, for a long running program you could either set up file monitoring (e.g. inotify) or handle it the same way as configuration file reload (e.g. SIGHUP). (These example mechanisms are Linux/Unix specific, because that is what I know, I presume other systems have equivalents.)

Also, Windows seems like an annoying issue, as it doesn't have tzdata as I understand it. But surely it must have the same data in some other way, or it couldn't handle local time for the system clock correctly on the OS level. Is this data not exposed to user applications (even if not on the same format that Unixes expose it)?

And then what about IOT/embedded products that run on bare metal, how do you handle timezones there?

Yes, assuming existence of /usr/share/zoneinfo, or even a filesystem, is too much for the broad spectrum of targets that a basic time manipulation library such as time should be available on. There should be an abstraction trait to be able to plug in compiled-in data, a Unix tzdata file (cached with the instance), or a custom solution.

1 Like

This is a nice overview. It about matches my perception of things too. But I would want to ask:

Why don't existing datetime libraries (outside of Rust) that offer a high level and convenient API support leap seconds? I'm thinking specifically about TC39's Temporal project. That project has gotten person years of experts carefully thinking about datetime. If you read their discussions, it's very clear that they care deeply about avoiding footguns and getting things correct. Yet, they almost entirely punt on the leap second issue. Why is that? My favorite explanation is "it doesn't matter that much." But I don't know for sure. Most leap second bugs I've heard about are really just bugs in systems that assume the system clock is monotonic. I don't think I've heard about bugs related to "the span of time computed was one second less than it should have been." I can obviously imagine use cases where that really does matter (scientific calculations or medical devices), but those use cases typically have more specialized datetime libraries available to them. (I'd put hifitime in this bucket.) I've searched for an answer to this question and I've yet to find a satisfactory answer. ("because programmers don't want to deal with them" might work for any given individual programmer, but that doesn't fly for me in the TC39 case.)

I'd also inquire about DST transitions. Those seem like something that is far more likely to bite you than leap seconds will. I believe only chrono-tz supports them.

Finally, I think you missed icu_calendar (along with some other crates in that family like icu_timezone) in your analysis. Although there do appear to be some critical concepts missing there (durations?), it's hard for me to say because I've found it difficult to get a handle on what the API offers. Clearly though it does have support for many different kinds of calendars and is also taking on the i18n task.

7 Likes

It does seem to only include calendars still in use in modern times (which is probably for the better). I don't know of any library (in any language) for dealing correctly with historic calendars. E.g. not all areas (which don't necessarily correspond to modern countries) switched to Gregorian calendar at the same time, and some of them swiched back and forth a couple of times. My own native Sweden has probably the most messed up European example, that includes February 30th.

Of course, if you would want to support things like this you need to deal with not just ambiguous times (as can happen during DST transitions) but also ambiguous dates. And include reference to precise geographical location (different areas of modern day Switzerland switched at different points in time, a few I belive switched back and forth a couple of times).

Speaking of which, I didn't see it mentioned in the analysis in OP, but how do these libraries deal with ambiguous timestamps when using a geographical timezone (Europe/Stockholm for example) as opposed to a fixed offset timezone? I believe it would only be applicable to chrono at this point?

I agree. I'd expect such historic calendars to have very limited use cases. To the point that if you need to do computation with them, you probably should build something bespoke for your specific use case.

LocalResult is Chrono's answer to this. I agree that this question is only applicable for Chrono currently.

1 Like

Weeel... IIUC it's not strictly true. The relevant resolution states the following:

propose a new maximum value for the difference (UT1-UTC) that will ensure the continuity of UTC for at least a century

So hypothetically programmers in 2135 or later may have to deal with a leap minute (assuming the Earth rotation will not slow dramatically before that). But I guess a more likely scenario is that the maximum value will be simply increased again and it will be recommended to move time zones instead several centuries later, since introducing leap something after a whole century will become unpractical and prohibitively damaging. Also, a bigger issue probably will be migration from TAI to TCB with associated relativity shenanigans between Earth and Lunar/Martian colonies.

time-tz redid the same API on top of time. We're in process of making its analog of LocalResult a proper Result as well.

2 Likes

Having an option between vendored and system-provided is precisely what will be done. There is no need to parse a file on every parsing of an RFC3339. If you're referring to time zones, the spec only supports fixed offers. If referring to leap seconds, it would absolutely be an external crate (maintained by myself). Note that even if it were decided to parse the leap second file rather than rely on external data, those files have expiration dates baked into them. So it would only be necessary to check if it's past that point, which is trivial to check. Even then, parsing a file is surprisingly cheap with the parser I wrote, particularly for the smallest of files (which leap second lists are).

So that would be a crate providing a compiled-in table of leap seconds, correct? Then whenever the IERS decides to publish a new leap second, all applications that use that crate in their dependency graph (mostly as an indirect dependency through time) would need to be rebuilt to use the latest release of the crate. While this is possible for most applications, I would also like to see a more centralized way to manage this, one that does not involve recompilation and redeployment.

What to do if the time is past the expiration date? If OffsetDateTime::parse falls back to the current behavior, there'd still be a possibility for an application to validate an RFC3339 string that refers to an already impossible leap second because the local leap second file is outdated. So I'd rather suffer the inconvenience and make the responsibility explicit for the API user to provide up to date leap second data. If it's parsed from a file, they can also decide on how frequently they want to update it.

1 Like

For clarity, the order would be

  • vendored list from IERS
  • system (if enabled)
  • current situation, but probably with more stringent checks (leap seconds always fall at the end of June or December, with March and September being fallbacks)

Huh? Seems rather limited to not support the local time on most systems. Every desktop or laptop tends to use a geographical timezone (Europe/Stockholm, Europe/Berlin, etc) rather than manually changing between fixed offsets twice a year.

time can retrieve the local time with the offset currently in use (albeit with limitations as described in the OP). The time zone changes are controlled by the OS. The layer to fully support time zones for arbitrary time computations is in the works, or you can try to use time-tz now.

The spec @jhpratt was referring to is RFC 3339, a format that does not convey information on geographical time zones.

1 Like

This seems like a good tradeoff: you don't go reading a system file if the vendored list is good for the date in question.

I would even go as far as to invalidate any input with second 60 that's past the expiration date for the available leap second list. The share of cases where this would be critical is likely to be miniscule anyway (and for them it's worth to provide a bring-your-own parse_with_leap_second_list method), and the onus should be on the system and/or the library to be up to date regarding the known leap seconds.