Reading multiple files based on a partial name?

Hello!

I am trying to build a web based frontend for BORG Backup. I got it to store logfiles in JSON format to logfiles in the format /var/log/borg/-.log, works nicely so far.

What i am now trying to do is open each file, find some specific details and send them off using reqwest to some API on the other end.

i can open a file that has a static filename and read content from it, but i am unsure how to proceed with multiple files that have dynamic names.

ultimatly it would be something like:
"find all files that start with current date" (i can use Chrono for that) iterate through them, and send off the data for each iteration.

but how do i open /var/log/borg/-.log using rust, while also using the current date?

Are you looking for a glob?

1 Like

not exactly, because that would open many files i think? :slight_smile:

for example:

last night our home directories where backed up. So BORG logged that to /var/log/borg/20220628-homes.log but /etc/ was also backed up, and logged to /var/log/borg/20220628-etc.log

These files includes how many files where backed up, and wether it was succesful. The idea is that i extract this data from those files, and then use request to send them off to a API build with actix. So i need to figure out how to iterate /var/log/borg for files with the current date (20220628) then extract home and etc from the filename so i can use that to tell the api what backup we're talking about.

--

actually, i think you may be right :slight_smile: i need to play with this!

@H2CO3 thank you AGAIN for helping me :slight_smile:

I'm not sure what you are getting at. Aren't you trying to get the names of many files based on a pattern? Isn't that exactly what you were asking?

1 Like

thats why i said you might be right :wink:
i managed to limit to only the files of today :slight_smile:

What about using the walkdir crate to find all the files in a folder and just filtering out the names of the files you don't care about?

2 Likes

currently i have this:

extern crate glob;

use chrono::{Utc};
use glob::glob;
use std::fs;

fn main() {
    let date = Utc::now().format("%Y%m%d").to_string();
    let source_files = format!("{}-*", date);
    for entry in glob(&source_files).expect("Failed to read glob pattern") {
        let file_name = format!("{}", entry.unwrap().display());
        let file_content = fs::read_to_string(&file_name).expect("Something went wrong reading the file");
        let json: serde_json::Value = serde_json::from_str(&file_content)
        .expect("JSON does not have correct format.");
        println!("Duration: {}", json["archive"]["duration"]);
        println!("Files: {}", json["archive"]["stats"]["nfiles"]);
        println!("Compressed Size: {}", json["archive"]["stats"]["compressed_size"]);
        println!("Original Size: {}", json["archive"]["stats"]["original_size"]);

    }
}

Which works nicely :slight_smile:
the backups finish around 4:00 in the morning, with cron i can run this program in the evening. Next step is using reqwest to send it all off to an API :slight_smile:

As a suggestion since your glob pattern is simple, you could use std::fs::read_dir and Path::starts_with to avoid pulling glob into your dependencies (unless you already have it).

And also serde_json::from_reader(File::open(&path).unwrap()) would avoid allocating a temporary string. Of course what you're doing is totally appropriate for a short script running once a day.

3 Likes

i am always open to suggestions, thank you! :slight_smile: i'm still a n00b so anything i can learn.. :wink:

My idea would look something like this:

use chrono::Utc;
use std::fs::File;
use std::path::Path;

fn main() -> std::io::Result<()> {
    let pattern = format!("{}-", Utc::now().format("%Y%m%d"));
    for entry in std::fs::read_dir(".")? {
        let entry = entry?;
        let file_name = entry.file_name();
        let file_name: &Path = file_name.as_ref();
        if !file_name.starts_with(&pattern) {
            continue;
        }
        let json: serde_json::Value = serde_json::from_reader(File::open(entry.path())?)?;
        println!("Duration: {}", json["archive"]["duration"]);
        println!("Files: {}", json["archive"]["stats"]["nfiles"]);
        println!(
            "Compressed Size: {}",
            json["archive"]["stats"]["compressed_size"]
        );
        println!(
            "Original Size: {}",
            json["archive"]["stats"]["original_size"]
        );
    }
    Ok(())
}

I haven't compiled it, maybe the std::io::Result would have to be replaced by a Result<(), Box<dyn std::error::Error>>.

1 Like

thank you :slight_smile:
your example didn't work, but i got it working like so:

use chrono::Utc;
use std::fs::File;
use std::path::Path;

fn main() -> std::io::Result<()> {
    let pattern = format!("{}-", Utc::now().format("%Y%m%d"));
    println!("DATE: {}", pattern);
    for entry in std::fs::read_dir("/var/log/borg")? {
        let entry = entry?;
        let file_name = entry.file_name();
        let file_name: &Path = file_name.as_ref();
        if !file_name.starts_with(&pattern) {
            let json: serde_json::Value = serde_json::from_reader(File::open(entry.path())?)?;
        println!("Duration: {}", json["archive"]["duration"]);
        println!("Files: {}", json["archive"]["stats"]["nfiles"]);
        println!(
            "Compressed Size: {}",
            json["archive"]["stats"]["compressed_size"]
        );
        println!(
            "Original Size: {}",
            json["archive"]["stats"]["original_size"]
        );
        }
    }
    Ok(())
}
1 Like

eventually i ended up with the original code using glob, i couldn't figure out how else to limit to only the log of today :slight_smile: the code with Path actually added all the logs :slight_smile:

It's not the fault of Path. Your condition is wrong. You are checking if the file name does not start with the pattern.

1 Like

ouch.

i didnt notice the ! there

so, hopefully i can ask for some more help :slight_smile:

It works now, amd i'm getting my data in a nicely json formatted string that i can throw against an web API that ingests it and puts it in a database for further processing :slight_smile:
however, some files have 2 backups and thats where my code fails.
so a file with:
{ json string}
works, but a file with:
{json string}{json string}
fails, as is to be expected i suppose. So now i guess i need to read the file, iterate over it and convert each string into a struct? but how do i tell it where a json string ends and a new one starts?

I think in that kind of case you want to use the StreamDeserizalier.

1 Like

thank you, that does the trick indeed!
now i have this:

   for entry in glob(&source_files).expect("Failed to read glob pattern") {
        let file_name = format!("{}", entry.unwrap().display());
        let file_content = fs::read_to_string(&file_name).expect("Something went wrong reading the file");
        // ADDED FOR MULTIPLE VALUES
        let data = file_content;
        let stream = Deserializer::from_str(&data).into_iter::<Value>();
        for value in stream {
            let line = json!(value);
            let token = &token.to_string();
            let name = &line["archive"]["name"].to_string();
            let duration = &line["archive"]["duration"].to_string();
            let files = &line["archive"]["stats"]["nfiles"].to_string();
            let compressed = &line["archive"]["stats"]["compressed_size"].to_string();
            let original = &line["archive"]["stats"]["original_size"].to_string();
            let repository = &line["repository"]["id"].to_string();

and this gives no error in VSCode, but when i compile it i get:

33  |             let line = json!(value);
      |                        ^^^^^^^^^^^^ the trait `serde::ser::Serialize` is not implemented for `serde_json::Error

The StreamDeserializer is an Iterator yielding Results. Each deserialization could fail.

So, following your style, you'd have to unwrap it. And the json! macro here is useless.

You should learn to actually handle errors. There are examples in this thread.

1 Like

i was thinking i use match {} for that? because that returns Some or None

edit::

I'm half way there! i now have:

        for value in stream {
            // println!("{}\n\n", value.unwrap());
            let mut object = value.unwrap();
            println!("name: {}",object["archive"].get_mut("name").unwrap());
            println!("files: {}",object["archive"][7].get_mut("files").unwrap());
        }

which works almost..
i tried
println!("files: {}",object["archive"][stats].get_mut("files").unwrap());
in every possible way :slight_smile: to access

{
"archive": {
...
"stats": {
    "files": 12345,
     ...
    },
}

but i cant access the data in files :expressionless: I'm probably very stupid :frowning: but any pointers as to which direction i should go (and yes, i need to read about error handling and json too)

i am trying to read this file:

but i'm not interested in all the fields. Should i still create a Struct with all the fields?

basicly, i want to open that file, loop through the two json outputs and collect some fields for further processing.

Defining a struct/enum to deserialize into is definitely the way to go. Serde will ignore fields you didn't define, so just define the struct's fields you care about.

2 Likes