Mapping large files


#1

Hi all, first post and brand new to Rust. So far, I LOVE Rust and I’m coming from a mostly C background.

I need to map some huge text files… Anywhere from 75MB to 5GB large, scan each line to see if it matches a regex pattern, and then if so, print that line and the line number.

Not that difficult to do but since I’m new to the language, I wanted to ask which data type and allocation function would be best for this? Should I just use a read_to_string on the entire file? Use an array or vector? What features of Rust would be helpful in dealing with performance with a program like this? Thanks so much. Really enjoying Rust so far.


#2

I’d guess a memory map would be your best bet when dealing with such large files. Although be aware that any memory mapped in such a way may be unsafe because a file can change its contents at any time.


#3

Is it a learning exercise, or a serious project? If a serious project, I’d try to use ripgrep’s internals.


#4

Serious project. Thank you!