Idiomatic way of reading a text file line by line in reverse

What would be the most idiomatic way to read a text file line by line in reverse. I tried to call rev() on BufRead#lines() however that does not work. Any ideas?

What problem are you trying to solve? i.e., Why do you need to read the lines of a file in reverse?

I expect you will need to roll this on your own if you really need it. It is quite an uncommon use case. You'll have to, for example, use seek calls to position the file pointer in the correct place. Using BufReader won't work because it reads the file in normal order.

It's a rare problem. File system doesn't allow you to read a file from the end.

You need to do the following(or something like this).

use std::io::{BufRead, BufReader};
use std::fs::File;

fn main() {
    
    let file = BufReader::new(File::open("read_file.rs").unwrap());
    let mut lines: Vec<_> = file.lines().map(|line| { line.unwrap() }).collect();
    lines.reverse();
    
    for line in lines.iter() {
        println!("{}", line);
    }
}

Perhaps reading to a string is more spece-efficient.

use std::fs::File;
use std::io::Read;

fn main() {
    let mut file = File::open("foo.txt").expect("opening file");
    let mut text = String::new();
    file.read_to_string(&mut text).expect("reading file");

    for line in text.lines().rev() {
        println!("{}", line);
    }
}

Thanks for the replies and examples!

I'm looking to read log files in reverse and be able to stop when I reach a specific time.

The given examples are not really usable for a big file, because it will scan the entire file to the end.

You will have to seek to some position close to the end of the file, read a chunk, do with it whatever you need to do, and repeat.

No guarantees here, I'm just scribbling this down for reference:

let mut f = File::open("foo.txt").unwrap();
let mut buffer = String::new();
// Determine file size.
let file_size = f.metadata().unwrap().len();
 // read last 10240 bytes. Adapt this for your requirements.
let chunk_size = 10240;
let start_pos = if file_size < chunk_size { 0 } else { file_size - chunk_size };
// Seek to end of file - chunk_size.
f.seek(std::io::SeekFrom::Start(start_pos)).unwrap();
// Read to buffer.
f.take(chunk_size).read_to_string(&mut buffer).unwrap();

let mut lines = buffer.lines();

// The first line is probably a partial line now, cut off in the middle.
// Safe it for now...
let first_line = lines.next().unwrap();

// Do whatever with the actual lines.
for line in lines { println!("{}", line.unwrap()); }

// Repeat process by seeking back by chunk_size again.
// Note that you now have to merge the last line of the new buffer contents with the previous first_line...
.....
...

If you wrap this up in a nice struct it's not so bad. :wink:

Check out bounded_tail here https://github.com/uutils/coreutils/blob/master/src/tail/tail.rs which is doing somewhat similar things under the hood.

Thanks for the replies all! They are helping me develop a small Iterator that reads a file line-by-line in reverse using an internal buffer.

1 Like

Nice. Might be useful to package that up into its own crate.

I put together a crate for this and published it:

https://crates.io/crates/rev_lines

6 Likes

Sent you a pull request to generalize it to work on other Seek+Read types other than just files. Thanks for making it. Should be useful for the occasional need.

Oh, and I think your Travis tests are failing due to the documentation code. I had the same problem running cargo test locally, and deleting your code comments fixed it.

1 Like

Thanks for the PR! Fixed up the docs and released another version.

Hey all, adding a note here for future people that come across this thread - the solutions here are fragile since they mix UTF-8 decoding with byte based seeking. This approach is quite valid if the input is ASCII, but then UTF-8 decoding should not be used. I wrote more about this here: Issues with non-ASCII characters? · Issue #3 · mjc-gh/rev_lines · GitHub

Apologies for bumping this thread to the top of the list, my primary intention is not to bring attention to this thread, but instead to help other folks that come across it in the future. That said, I would be interested if folks have already written correct solutions for parsing a suffix of a UTF-8 file backwards.

3 Likes

Finding appropriate break points in utf8 is not a difficult task. Leading bytes and trailing bytes are distinct sets, and a leading byte tells you exactly how many trailing bytes follow, so from the beginning of a chunk, you can simply read until you reach a leading byte, reserve all the just-read bytes for the end of the previous chunk (which will be read next) , and then parse the rest of the chunk as normal utf-8