Parsing System Time

I am currently using Rust in Linux to examine nearly a billion files. One of the things I need is the file modified date/time in a particular format, which can be conveniently be obtained using Crate chrono. I therefore extract a file's Metadata from its directory entry. While in practice error handling is very important (so the routine never crashes out and abandons hours of successful results), for the purposes of this question I will by-pass all that by using .unwrap() for clarity. so I have:

let file_metadata=fs::metadata(file_path).unwrap();
let file_last_modified=file_metadata.modified().unwrap();

If I now look at the format of this:

print!("{:?} : ",file_last_modified);

I get, for example:- 'SystemTime { tv_sec: 1594033168, tv_nsec: 848710897 }' which I feel I need to somehow feed into crate chrono. I am aware of:-

let file_datestamp=Utc.timestamp(1594033168, 848710897); //(crate chrono)

which then, for example, be could followed by:

println!("{}", file_datestamp.format("%Y-%m-%d@%H:%M:%S"));

to give what I asked for:- 2020-07-06@10:59:28. Hopefully NEVER 'summer time corrected', am/pm involved, timezone ambiguous, or anything else similar to confuse exactly 'when' this means. Simple sorting by this parameter should put a list of results into the order the files were last modified as real time progressed. Yet it is human understandable, and no spaces to cause future record parsing problems.

However, in rust I am not used to having to parse out the two numbers out of the 'SystemTime' type myself, with all the format/error variations it could get up to. To say nothing of my inferior unoptimized coding skills with rust. I feel sure there would be something (probably) obvious that I am missing to do this much more concisely, quickly and efficiently.

I have tried to implement:-

pub fn timestamp(&self) -> i64
(Returns the number of non-leap seconds since January 1, 1970 0:00:00 UTC (aka "UNIX timestamp").) and
pub fn timestamp_nanos(&self) -> i64
(Returns the number of non-leap-nanoseconds since January 1, 1970 UTC)

found in: https://docs.rs/chrono/0.4.13/chrono/struct.DateTime.html

with every 'use' at the top I can think of:-

let file_timestamp_sec=file_last_modified.timestamp();
let file_timestamp_nanos=file_last_modified.timestamp_nanos();

but I get:-

error[E0599]: no method named timestamp found for type std::time::SystemTime in the current scope

Thank you very much in advance. (I never get on with or really understand up-voting or down-voting social implication systems.)

Yogi39

Replying to the timestamp part:

won't work because (AFAIK) there is no trait in chrono for which timestamp() would be implemented for SystemTime.

You have two options, either to convert SystemTime to DateTime, e.g.

let dt: chrono::DateTime<chrono::Utc> = st.into();
let t = dt.timestamp());

or calculate the timestamp directly using SystemTime using duration since the start of epoch: https://doc.rust-lang.org/std/time/constant.UNIX_EPOCH.html

let t = SystemTime::now().duration_since(UNIX_EPOCH).unwrap().timestamp();

Note that the two timestamp() methods return different types. Chrono returns i64, duration_since() returns Result<Duration, ...> where Duration contains both sec and nsec parts.

3 Likes

You may want to stash the results in some persistent data store as you're going (e.g. a file on disk or a database). That way even if you crash midway through, your program is able to skip items it's already checked.

It is producing a sort of index database so that files can be located when required again. Primarily
as a sort of master backup system. Yes, persistent stores are extensively used. In general, my
routines store things it can't handle in an an error file, and ploughs on. Problems can then be
sorted and dealt with all at once, rather than the system forever stopping and needing
attention over each one. Except when the system still panics and crashes out because it finds
nowhere to output any further progress. It would have spent its time filling its persistent
store - an empty 4 Terabyte drive - mainly with error messages around a common theme. Then the
principle wastes time.

Most commonly encountered common theme: Rusts 'stings are utf8 or error' culture. A fair proportion
of the files I deal with are older than the invention of uft8 - and paths are not ascii either.
This rules out the use of so many of Rusts methods to assist handling of what are, after all,
IMHO still legitimate strings. OStrings are so badly supported I find the type useless. It is
amazing how frequently this issue crops up in often inconvenient and not obvious ways.

Thank you very much for your considerable help.

Your solution 1: It seems that your 'dt' is 95% of the way there, but to simply reformat it a bit,
it it calculated back into an 'i64' which is simply the duration since Unix Epoch accurate to the
second only. I would need timestamp_nanos to improve this. This is then calculated back again into a yr/mnth/day/hrs/mins/sec slightly different format. Surely this cannot be a very efficient way of achieving my requirement?
It therefore seemed that your solution 2 should be the answer. However, this seemed to suffer
from the self same issue that I had in the first place:-

'error[E0599]: no method named timestamp found for type std::time::Duration in the current scope'

I would very much like your opinion of the following fudge:-
In the situation that I need time to 10ms (rounding not important)

let dt: chrono::DateTimechrono::Utc = st.into();//from your solution 1.
let dtv=format!("{:?}",dt);let dtv=dtv.as_bytes().to_vec();//turn it into a byte vector. better offers?
if dtv.len()<22 {handle this as an error};//What can go wrong here?
let dtf=&fdtv[0..22];//truncate to required format length

I can now adjust any formatting symbols to those required, or better still, adjust the rest of
the program to this format, it meets the basic requirements. (At least I hope so from the point of
view of timezones.) I need it as a u8 vector (or slice), anyway, I cannot use utf8 strings. I find
OStrings in rust next to useless.

It seems such a clumsy workaround, but surely it is faster than anything else available?

Not answering your question but for formatting probably better to use "%Y-%m-%dT%H:%M:%SZ" to be ISO 8601 compatible. Z explicitly says to future parser that timestamps are in UTC.

1 Like

I do so whole heartedly agree that stands should be used and adhered to, always as far as feasible if there is little penalty. However, one has to be aware of the penalty of adhering to a standards that was not designed or really relevant to the situation, and fitting square pegs into round holes.

Furthermore, and importantly, conformity to standards to data stored internal to a system also gives a false sense of accuracy and authenticity to the data, encouraging miss use in a context quite outside the capability of the environment it was taken from. For example, to show what I mean: A file is from an expensive ground vibration sensor. The sensor verifies its accurate clock, and if ever suspect the data would be marked. The whole file integrity can be verified. However, what is highly suspect is the clock in the cheap laptop used by the engineer to collect from the sensor and transport the data back to base. This has given the file a 'last modified datetime' a stamp somewhat (days even) before the data was generated. This leads the naive to assume, on a first sort, the file cannot possibly contain the wanted data it does, and reject it. A case of rubbish metadata from the environment of a cheap badly looked after laptop being given plausibility by being well presented. However, having been put in a database etc. this serious error is in no way apparent.

When finished, an inquiry to the system I am writing would simply not present (meta or any other) data, so easily shown to be rubbish, as plausible. All data presented by such an inquiry would of course now be 'properly formatted'.

People maintaining the system would understand what the internal data is, its format and limitations, without the need of standards. The best format for such data is therefore best optimized for use by the system, compromised for ease of checking on what the system is doing by humans looking for system bugs. (Such as the one described above). If others try to make any use of such files, I am afraid they do so at their peril.

In line with that, it has occurred to me, the format of the line of data that contains this 'file last modified datetime' will probably be used in the database many tens of millions of times. Every extra byte in the format will make the database bigger by tens of megabytes. Not insignificant. I am seriously considering removing the slashes and colons, not expanding it to conform to standards.

Yeah agree, with your volume of data it actually makes sense.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.