The function below parses a byte string, which is expected to be all-ASCII and to take the form of a decimal fraction, into a std::time::Duration. It cannot use a floating-point intermediate for reasons explained in the comments (basically, loss of precision) and so there's a bunch of string bashing to compute the nanoseconds part of the value, and I feel like there's probably a better way to write it. In particular I don't love the chain of manipulations done to fraction or the while places < 9 loop at the bottom.
How would you write this function? Note: the format is pinned by external compatibility constraints, so make sure to accept exactly the same set of strings.
use std::time::Duration;
/// Parse a string (as read directly from disk, so a [u8]) as decimal
/// seconds and nanoseconds since the Unix epoch, into a Duration object.
pub fn parse_decimal_timestamp(data: &[u8]) -> Option<Duration> {
// An f64 can only represent Unix timestamps with full nanosecond
// precision if they are within ±2**53 nanoseconds of
// 1970-01-01T00:00:00.000000000Z. This is a smaller range than
// you might expect: only from 6pm on September 18, 1969 until 6am
// on April 15, 1970. So, we cannot use f64 as an intermediary here.
let ts = str::from_utf8(data).ok()?;
let (seconds, fraction) = ts.split_once(".").unwrap_or((ts, ""));
let fraction = fraction.trim_end_matches("0");
let fraction = if fraction == "" {
"0"
} else if fraction.len() <= 9 {
fraction
} else {
// Truncate to 9 digits, but reject the whole timestamp if any
// of the discarded characters are not digits. (Non-digits in
// the preserved part of the fraction will be rejected by
// .parse::<u32> below.)
if (&fraction[9..]).chars().any(|c| !c.is_ascii_digit()) {
return None;
}
&fraction[..9]
};
let secs = seconds.parse::<u64>().ok()?;
let mut nanos = fraction.parse::<u32>().ok()?;
if nanos > 0 {
let mut places = fraction.len();
while places < 9 {
places += 1;
nanos *= 10;
}
}
Some(Duration::new(secs, nanos))
}
If you need maximum performance, one issue is that you are parsing the input as UTF8 and then re-parsing as a highly restricted ASCII subset. You could consider validating the input &[u8] directly.
Oh wow. It would never have occurred to me to look for pow as a function from integers to integers. That's what 30 years of C does to your brain, I guess.
Misses an error case -- that any(|c| !c.is_ascii_digit()) clause is there for a reason.
Are there guidelines anywhere for how to do that in a readable manner? There are several other places in this program where I need to validate input that's expected to be a restricted subset of ASCII, but u8 slices and OsStr have such a limited API compared to str that it's really awkward to work with them.
The bstr crate is often recommended, there is probably other helpful stuff on crates.io as well. Unfortunately, I personally don't have much experience in this department.
That doesn't work. Suppose the input is 1.123456789123456789, the desired return value is Some(Duration::new(1, 123456789)) but parsing 123456789123456789 into a u32 will overflow so your code will return None.
One potentially-undesirable detail of the current behavior is that decimal digits beyond the 9th are truncated, so the timestamp is always rounded down. You may want a different form of rounding.
use std::time::Duration;
/// Parse a string (as read directly from disk, so a [u8]) as decimal
/// seconds and nanoseconds since the Unix epoch, into a Duration object.
pub fn parse_decimal_timestamp(data: &[u8]) -> Option<Duration> {
// An f64 can only represent Unix timestamps with full nanosecond
// precision if they are within ±2**53 nanoseconds of
// 1970-01-01T00:00:00.000000000Z. This is a smaller range than
// you might expect: only from 6pm on September 18, 1969 until 6am
// on April 15, 1970. So, we cannot use f64 as an intermediary here.
let mut secs: u64 = 0;
let mut nanos: u32 = 0;
'parse: {
let mut chars_iter = data.iter().copied();
// Parse full seconds
'seconds: {
for c in chars_iter.by_ref() {
match c {
digit @ b'0'..=b'9' => {
secs = secs.checked_mul(10)?.checked_add((digit - b'0').into())?
}
b'.' => break 'seconds, // Jump to nanoseconds parse code.
_ => return None,
}
}
// Skip nanosecond parse code, because no `.` was encountered.
break 'parse;
}
// Parse nanoseconds
let mut decimals_left: u32 = 9;
while decimals_left > 0 {
match chars_iter.next() {
Some(digit @ b'0'..=b'9') => {
nanos = nanos * 10 + (digit - b'0').into();
decimals_left -= 1;
}
Some(_) => return None,
None => {
nanos *= 10_u32.pow(decimals_left);
break 'parse;
}
}
}
// Validate remaining nanoseconds
if !chars_iter.all(|c| c.is_ascii_digit()) {
return None;
}
}
Some(Duration::new(secs, nanos))
}