How to modify content inside a file following a match

ilmoi · September 23, 2021, 6:24pm

I have a text file that looks like this:

aaa
bbb
ccc
ddd
EEE
f
// stuff here
f
f
// stuff here
f

I need to:

Locate the line that contains "EEE"
Locate the 4x "f"s following the EEE (there are many more "f"s in the doc - I specifically need the 4 that follow.
Replace content between the 1st and 2nd f, as well as 3rd and 4th. Note that old and new content may or may not be the same size (number of lines).

What's the best way to do this? Just pointing me to the right libraries / tutorials would be enough. I just don't really know where to start:)

jameseb7 · September 23, 2021, 6:49pm

First, you should open the file and read it into a String (assuming the file is entirely ASCII or UTF-8 text). The convenience function read_to_string() is probably easiest for this, but you can use the functions under OpenOptions and File to open and read the file manually instead for future reference. Then you should use the methods on String to edit the returned String. The methods match_indicies() (which produces an iterator over all the indices where there is a match against a particular pattern) and replace_range() may be particularly useful for finding and replacing the relevant portions of the String. If the replacement text is the same in each case replace() may be a more efficient approach. For this sort of use-case you may also consider a regular expression library like regex. Then you just need to write the text back to the file with something like write().

jkugelman · September 23, 2021, 6:49pm

Modifying files in place isn't easy. Text editors make it look easy, but files are like arrays. Inserting or deleting content in the middle means you have to shift everything afterwards around.

Tools that modify files usually take one of two approaches:

Read the entire file into memory, modify an in-memory buffer, then write the whole thing back out when saving.
Read the input file, write a modified version to a temporary file, then if the entire write is successful move the temp file over top of the original.

The second approach has a couple of nice properties: It can be done in a streaming manner without reading the whole file in at once, avoiding O(file size) memory usage. It avoids corruption if the write fails partway through; the original file is still intact. Moving a file is an atomic operation, so other programs won't see the update until it's finished.

This is all language agnostic. For some Rust-specific starting points see:

ilmoi · September 23, 2021, 7:03pm

Interesting - so would I do something like this:

search for the line containing EEE
once found, save the remaining contents of the file to a temp variable
search through the contents, this time for f using match_indices()
replace_range() between 1st and 2nd, and 3rd and 4th

Does that sound reasonable?

jameseb7 · September 23, 2021, 7:13pm

The suggestions I gave were mainly oriented toward the "Read the entire file into memory, modify an in-memory buffer" approach @jkugelman suggested, then you could search for "EEE" in the buffer and do the editing there. If the file you're working with is particularly large, you could just read the part after "EEE" into the buffer, do the replacing, then write it back to the file at the appropriate point, but that makes the use of the file access APIs in the standard library a little more complicated, since you would be manually reading a little at a time, then you would need to open the file for editing, and move to the right point before writing back the buffer (and reduce the file length if the replacement is smaller than the orignal). Reading the entire file into memory is somewhat easier to work with.

jkugelman · September 23, 2021, 8:04pm

I would go with approach #2 from my last post, reading and writing line-by-line in a streaming manner. Something like:

Open the input file in and a temporary output file out.
Loop #1: Read a line at a time until you hit EEE. Write each line to out.
Loop #2: Read a line at a time until you hit the 1st f. Write each line to out.
Write the first set of replacement content to out.
Loop #3: Read a line at a time until you hit the 2nd f. Discard these lines.
Loop #4: Read a line at a time until you hit the 3rd f. Write each line to out.
Write the second set of replacement content to out.
Loop #5: Read a line at a time until you hit the 4th f. Discard these lines.
Loop #6: Read a line at a time until you hit EOF. Write each line to out.
Rename out to in.

Why so many loops? It's effectively a state machine. The loops have slightly different actions based on where it's at in the input file. When it gets to the end, step #10 is where the changes are actually "committed". If it doesn't make it to step #10 then the input file is left unchanged.

I admit, this is rather longwinded compared to doing everything in memory. Still, it's a good exercise in doing things efficiently. Reading everything into a memory buffer and doing a couple of searches and replaces is easier to code up, but it'll use a lot more memory and probably do a bunch of passes over the file without you even realizing. Looping over the file by hand guarantees that everything's done in one pass and with minimal memory usage.

And hey, extract the repetitive logic into a helper function or two and the code won't even be that hard on the eyes.

ilmoi · September 24, 2021, 2:34pm

Thanks @jameseb7 and @jkugelman - super helpful!

system · December 23, 2021, 2:34pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Replacing content in file	7	9320	March 12, 2021
Simple txt file read, find word, manipulate and write new file help	6	847	August 24, 2020
How to delete lines or character from file? help	5	4593	July 13, 2021
Best way to replace text in a file help	3	1185	September 12, 2023
OpenOptions .truncate() help	12	2717	February 10, 2022

How to modify content inside a file following a match

Related topics