Hi everyone,
I'm just a couple of days into Rust, and I'm struggling to understand how it is possible to implement the following scenario:
Scenario
- There is a large text file, on a scale of gigabytes.
- Search this file forwards for some pattern (pattern1).
- If pattern1 is found, search backwards for another pattern (pattern2).
- If pattern2 is found, do something, go back to step 1, and proceed forward from the line after the previous match for pattern1. Repeat until EOF.
What have I tried
In Perl such task is very easy to implement with 'Tie::File' mapping file lines to array. It works too slow for me as I have many such files to search (doing it with 'glob'), and have to do it regularly.
What I have got now is a forward-only lookup, it works, but I'm not sure that this is the right way to proceed:
extern crate glob;
use glob::glob;
use std::fs::File;
use std::io::BufReader;
use std::io::BufRead;
fn main() {
let path_glob = "/some/*/path/*.txt".to_string();
for entry in glob(&path_glob).expect("Failed to read glob pattern") {
match entry {
Ok(path) => {
println!("{:?}", path.display());
let mut f = File::open(path).expect("Can't read file");
let mut file = BufReader::new(&f);
for line in file.lines() {
match line {
Ok(valid_utf8) => {
if valid_utf8.contains("pattern1") {
println!("{}", valid_utf8);
}
}
Err(e) => {
println!("Invalid UTF-8: {}", e.to_string())
}
};
}
}
Err(e) => println!("{:?}", e),
}
}
}
Honestly tried to google it out, but it seems that I lack some fundamental understanding to be able to formulate effective search keywords.
Any help would be greatly appreciated!
Some additional info
There is a reason, why I'm looking to perform the search in the sequence described above.
Let’s say, the text file has 10 millions lines. In this case pattern2 will be found in every 5-10 lines in average, while pattern1 may be found just a handful of times or not found at all. Since I have a lot of such big files, I seek every possibility to make my search as fast as possible.
The text files are application logs, pattern1 represents some data that needs to be found in a context of specific application action. pattern2 represents the log record header that has action uid and action step number.