Just learning Rust and came across an issue I've not been able to figure out.
When using the follow code the truncate(true) options seems to wipeout the file content before I can read the content, make a change and write it back. I wanted to read file passed on command line with a colon and a line number. Parse the filename and line number to remove the line from the file.
example: fixkn known_hosts:33
Would remove line 33 from the file known_hosts.
I hope I was able to type this in correctly as my dev system is not on the internet. When I run this I get no data read in if I have truncate set to true. If i comment out that option it works. I wanted to edit the known_host file removing the offending line and have the new content written back in the known_host file vice vi'ng he file removing and saving.
Thank you for your help and time.
use std::io::prelude::*;
use std::io::{SeekFrom};
use std::io::{BufRead, BufReader, BufWriter};
use std::str::FromStr;
fn main() {
let args: Vec<&str> = args[1].split(":").collect();
if args.len() == 1 {
....
}
let values: Vec<&str> = args[1].split(":").collect();
let file_name = values[0];
let line_number = values[1];
let mut file = OpenOptions::new()
.read(true)
.write(true)
.truncate(true)
.open(filename)
.unwrap();
let reader = BufReader::new(&file);
let mut new_content = String::new();
for (index, line) in reader.lines().enumerate() {
let line = line.unwrap(); // Ignore errors.
if index == line_number {
println!("Removing line:{} | {}", index, line);
} else {
let formatted_line: String;
formatted_line = format!("{}\n", &line);
new_content.push_str(&formatted_line);
}
}
print!("{}", new_content);
}
I was thinking it would have triggered on a write or after a read given you can set read, write and truncate all to true before open. I can across examples that implied such behavior.
What would be the correct way to read all contents and write back to same file with previous contents removed?
The simplest but risky way is to call std::fs::File::set_len to set it to 0 after reading the contents. The risk is if you're program dies or is killed after set_len then the resulting file maybe missing data.
A more robust approach is to std::fs::copy the file to a tmp location modify it then std::fs::rename to replace the original.
As mentioned, editing in place caries the downside of losing half your file from a program termination of any sort. Another downside is that others may observe the file in a partially-written state.
You can copy the file to a temp file and edit in place like @Cocalus suggested, or you can:
Open target file for reading
Open temp file for writing
Read each line from target and
Write to temp file or not
Close temp file when done to flush it
Replace target file with temp file
Which is what I recommend. I don't think you'll gain much from anything more complicated in this particular case.
(It may be faster to read the entire file, find the prefix and suffix in memory, write the entire prefix, and write the entire suffix.... but I still don't think this is a use case where performance matters on that level.)
BTW. If you really want to be correct in respect to not loosing any data in case of a crash or race conditions you need to insert some fsyncs into places.
File systems are a form a database. Just a really crappy one. In a lot of applications it's not practical to store things like text configs, some resource descriptors etc. in database - it just hinders usability, when the rest of the system is build around files. So I think it's important to learn how to handle correctly basic Posix FS operations like atomic & crash-resistant modification of a file content.
I really wish Linux moved away from POSIX filesystems as a default persistence, and introduced something else - designed more like a database, learning from the mistakes of the past. I doubt it will ever happen though, and I guess we will have to wait coupe of decades, for OSes and computing paradigms to change altogether.
I'm not trying to argue the semantics of the definition of a "database". I'm trying to suggest that you use an existing, well-implemented, battle-tested system, which correctly enforces concurrency control, guarantees atomicity, durability, and transactionality, instead of re-implementing all of this by yourself.
Wwhen one has bulk of data, one just uses a database, but a lot of system software needs to write out just couple of text files with some simple state/configs, and for that a serde+yaml+correct file handling is all that's necessary and the user can view, edit, backup,restore etc.
Whether you use a database doesn't primarily depend on the amount of the data, but on its complexity, i.e., whether it has structure and what kind, and how the software will use it.
"Files" are mainly a user-facing concept. If all you need is produce some output that people will consume, then of course you just write it to a file and call it a day. However, if your software needs to store state, read it back, and ensure its consistency, then that definitely is worth using one of the many available, fast, and lightweight embeddable databases.
I have worked on more than one piece of software (primarily mobile applications and bioinformatics tools) of which the original authors thought it would be a good idea to "just use files". As requirements changed and data accumulated in various, unforeseen ways, it quickly turned into a nightmare of, for example, 30-second startup times for a simple banking app with a trivial UI.
Ensuring consistency isn't as easy as tacking an fsync() on the end of the code, either. In practice, platforms, file systems and syscalls have long-standing bugs related to fsync and advisory locking. Database authors have gone through the trouble of finding out about them and then mitigating them as much as possible. There is no point in duplicating their work.