Idiomatic way of reading a text file line by line in reverse

mikeycgto · December 27, 2016, 1:51am

What would be the most idiomatic way to read a text file line by line in reverse. I tried to call rev() on BufRead#lines() however that does not work. Any ideas?

BurntSushi · December 27, 2016, 2:35am

What problem are you trying to solve? i.e., Why do you need to read the lines of a file in reverse?

I expect you will need to roll this on your own if you really need it. It is quite an uncommon use case. You'll have to, for example, use seek calls to position the file pointer in the correct place. Using BufReader won't work because it reads the file in normal order.

asit-dhal · December 27, 2016, 2:52am

It's a rare problem. File system doesn't allow you to read a file from the end.

You need to do the following(or something like this).

use std::io::{BufRead, BufReader};
use std::fs::File;

fn main() {
    
    let file = BufReader::new(File::open("read_file.rs").unwrap());
    let mut lines: Vec<_> = file.lines().map(|line| { line.unwrap() }).collect();
    lines.reverse();
    
    for line in lines.iter() {
        println!("{}", line);
    }
}

sinkuu · December 27, 2016, 2:55am

Perhaps reading to a string is more spece-efficient.

use std::fs::File;
use std::io::Read;

fn main() {
    let mut file = File::open("foo.txt").expect("opening file");
    let mut text = String::new();
    file.read_to_string(&mut text).expect("reading file");

    for line in text.lines().rev() {
        println!("{}", line);
    }
}

mikeycgto · December 27, 2016, 2:31pm

Thanks for the replies and examples!

I'm looking to read log files in reverse and be able to stop when I reach a specific time.

theduke · January 14, 2017, 8:37pm

The given examples are not really usable for a big file, because it will scan the entire file to the end.

You will have to seek to some position close to the end of the file, read a chunk, do with it whatever you need to do, and repeat.

No guarantees here, I'm just scribbling this down for reference:

let mut f = File::open("foo.txt").unwrap();
let mut buffer = String::new();
// Determine file size.
let file_size = f.metadata().unwrap().len();
 // read last 10240 bytes. Adapt this for your requirements.
let chunk_size = 10240;
let start_pos = if file_size < chunk_size { 0 } else { file_size - chunk_size };
// Seek to end of file - chunk_size.
f.seek(std::io::SeekFrom::Start(start_pos)).unwrap();
// Read to buffer.
f.take(chunk_size).read_to_string(&mut buffer).unwrap();

let mut lines = buffer.lines();

// The first line is probably a partial line now, cut off in the middle.
// Safe it for now...
let first_line = lines.next().unwrap();

// Do whatever with the actual lines.
for line in lines { println!("{}", line.unwrap()); }

// Repeat process by seeking back by chunk_size again.
// Note that you now have to merge the last line of the new buffer contents with the previous first_line...
.....
...

If you wrap this up in a nice struct it's not so bad.

fitzgen · January 16, 2017, 1:31am

Check out bounded_tail here https://github.com/uutils/coreutils/blob/master/src/tail/tail.rs which is doing somewhat similar things under the hood.

mikeycgto · January 17, 2017, 3:28pm

Thanks for the replies all! They are helping me develop a small Iterator that reads a file line-by-line in reverse using an internal buffer.

tupshin · January 17, 2017, 3:44pm

Nice. Might be useful to package that up into its own crate.

mikeycgto · February 1, 2017, 1:06am

I put together a crate for this and published it:

https://crates.io/crates/rev_lines

tupshin · February 1, 2017, 1:51am

Sent you a pull request to generalize it to work on other Seek+Read types other than just files. Thanks for making it. Should be useful for the occasional need.

Oh, and I think your Travis tests are failing due to the documentation code. I had the same problem running cargo test locally, and deleting your code comments fixed it.

mikeycgto · February 1, 2017, 4:00am

Thanks for the PR! Fixed up the docs and released another version.

mgsloan · August 15, 2019, 3:33am

Hey all, adding a note here for future people that come across this thread - the solutions here are fragile since they mix UTF-8 decoding with byte based seeking. This approach is quite valid if the input is ASCII, but then UTF-8 decoding should not be used. I wrote more about this here: Issues with non-ASCII characters? · Issue #3 · mjc-gh/rev_lines · GitHub

Apologies for bumping this thread to the top of the list, my primary intention is not to bring attention to this thread, but instead to help other folks that come across it in the future. That said, I would be interested if folks have already written correct solutions for parsing a suffix of a UTF-8 file backwards.

cliff · August 15, 2019, 4:05am

Finding appropriate break points in utf8 is not a difficult task. Leading bytes and trailing bytes are distinct sets, and a leading byte tells you exactly how many trailing bytes follow, so from the beginning of a chunk, you can simply read until you reach a leading byte, reserve all the just-read bytes for the end of the previous chunk (which will be read next) , and then parse the rest of the chunk as normal utf-8

Topic		Replies	Views
Understanding BufReader and iteration through lines help	3	2675	January 12, 2023
BufReader Repeat first line help	3	702	December 11, 2019
Idiomatic way of reading lines in a safe manner help	12	1377	October 28, 2021
Is it possible to parse the file line by line without doing an allocation per line help	4	1556	March 11, 2022
Reader Abstraction help	5	598	July 17, 2022

Idiomatic way of reading a text file line by line in reverse

Related topics