Rust newbi: Text in file manipulation

Github

One of the first rust libraries I have written. It's a lib to manipulate js/ts files.

Looking for feedback to get a better understanding of what I'm doing "wrong" so I can get better. Also if you know of links to get a better understanding of text manipulation I'm very interested.

Thank you for any look at this :slight_smile:

How it works:

  • Creates (with default data if no file exists) and reads the file.
  • Holds data as a string
  • Exposes two functions insert_between() and insert_after()
  • Tries to insert between if pattern found else goes for an after something pattern.
  • Function to save the modified content back to file system.

lib.rs

use anyhow::Result;
use std::io::{Read, Write};

#[derive(Debug)]
pub struct FileManipulator {
    pub content: String,
    pub file_path: String,
}

impl FileManipulator {
    pub fn new(file_path: String, default_content: &str) -> Result<Self> {
        let mut file = match std::fs::OpenOptions::new()
            .read(true)
            .write(true)
            .open(&file_path)
        {
            Ok(f) => f,
            Err(_e) => {
                let mut file = std::fs::OpenOptions::new()
                    .create_new(true)
                    .write(true)
                    .open(&file_path)?;
                file.write_all(default_content.as_bytes())?;
                std::fs::OpenOptions::new().read(true).open(&file_path)?
            }
        };
        let mut content = String::new();
        file.read_to_string(&mut content)?;
        Ok(FileManipulator {
            content: content,
            file_path: file_path,
        })
    }
    pub fn insert_between(
        &mut self,
        between: &str,
        content: &str,
        after: Option<&str>,
    ) -> Result<()> {
        let re = regex::Regex::new(between).expect("Invalid regex");
        let splitted: Vec<&str> = re.split(&self.content).map(|s| s).collect();
        if splitted.len() != 3 {
            if let Some(after) = after {
                match self.insert_after(after, content, between) {
                    Ok(_) => return Ok(()),
                    Err(e) => return Err(anyhow::anyhow!(e)),
                };
            }
            return Err(anyhow::anyhow!(
                "Between pattern had {} partitions. Could not determine between. Maybe use insert_after",
                splitted.len()
            ));
        }
        let mut new_content = String::new();
        for (i, part) in splitted.iter().enumerate() {
            if i == 1 {
                let content = format!("{}{}{}{}{}", between, "\n", content, "\n", between);
                new_content.push_str(&content);
            } else {
                new_content.push_str(&part);
            }
        }
        self.content = new_content;
        Ok(())
    }
    pub fn insert_after(
        &mut self,
        after: &str,
        content: &str,
        between_pattern: &str,
    ) -> Result<()> {
        let re = regex::Regex::new(&after).expect("Invalid regex");
        let splitted: Vec<&str> = re.split(&self.content).map(|s| s).collect();
        if splitted.len() != 2 {
            return Err(anyhow::anyhow!(
                "After {} pattern had {} partition(s). Could not determine after what in {}.",
                after,
                splitted.len(),
                self.file_path,
            ));
        }
        let mut new_content = String::new();
        for (i, part) in splitted.iter().enumerate() {
            if i == 1 {
                let content = format!(
                    "{}{}{}{}{}{}{}{}{}",
                    after,
                    "\n",
                    &between_pattern,
                    "\n",
                    content,
                    "\n",
                    &between_pattern,
                    "\n",
                    part
                );
                new_content.push_str(&content);
            } else {
                new_content.push_str(&part);
            }
        }
        self.content = new_content;
        Ok(())
    }

    pub fn save(&self) -> Result<()> {
        let mut file = std::fs::OpenOptions::new()
            .write(true)
            .truncate(true)
            .open(&self.file_path)
            .expect("Required that file exist to save");
        file.write_all(self.content.as_bytes())
            .expect("Could not write to file");
        Ok(())
    }
}

Here are some things that jumped out while skimming through the code...


#[derive(Debug)]
pub struct FileManipulator {
    pub content: String,
    pub file_path: String,    <-----
}

The file_path should be a std::path::PathBuf, not a String.

Filesystem paths are finicky and have several invariants (directory separators, MAX_PATH on Windows , etc.), but a String is just a blob of text with no specified format. In Rust, when you have assumptions that need to be maintained it's customary to create a new type which enforces these assumptions by construction.

As a bonus, it also means you can add domain-specific functionality (e.g. PathBuf has a join() method for appending another component to the path, so you don't need to do things like format!("{}/{}", old_path, new_bit)). Plus a String can only be UTF-8 text, while a PathBuf is allowed to contain invalid UTF-8 (e.g. because some other program created a non-UTF-8 file and you need to refer to it.


     pub fn insert_between(
        &mut self,
        between: &str,
        content: &str,
        after: Option<&str>,
    ) -> Result<()> {
        ...
        if splitted.len() != 3 {
            if let Some(after) = after {
                match self.insert_after(after, content, between) {
                    Ok(_) => return Ok(()),
                    Err(e) => return Err(anyhow::anyhow!(e)),
                };
            }
        ...
    }

this should just be return self.insert_after(after, content, between). You're actually wrapping the existing error in a new error here instead of returning the failure directly, so if you looked at the error chain you'd see two errors instead of 1.


It feels like FileManipulator::insert_between() is a bit confused. If splitted.len() != 3 then it behaves exactly like insert_after(), but only when after is Some... What case are we trying to handle here? Multiple occurrences, no occurrences, etc.?

As a side note, you can Slice Patterns use to match on the number of items in a slice. This may help make the different cases easier to identify when others (or yourself 6 months down the track) look over the code.

match splitted.as_slice() {
    [text] => handle_no_occurrences(),
    [before, matched_text, after] => concatenate_the_parts(),
    _ => todo!("Handle repeated occurrences or other cases"),
}

                let content = format!(
                    "{}{}{}{}{}{}{}{}{}",
                    after,
                    "\n",
                    &between_pattern,
                    "\n",
                    content,
                    "\n",
                    &between_pattern,
                    "\n",
                    part
                );

This looks kinda funny. Did you know that if you have a &[&str] you can just use [&str]::join("\n") to concatenate the lot with "\n" between each item? This also works with Vec<&str> if you don't know how many elements there will be ahead of time.


This is more of a stylistic thing (so feel free to ignore me - don't worry, I won't be offended), but why do you carry the file path around inside the FileManipulator? If it were me I'd create a couple functions that manipulate some buffer (e.g. a String) then let the user call std::fs::write() to write the string out to disk.

That way your manipulations don't care about the filesystem at all.


static SOME_JS_CONTENT_CLASS: &str = r#"...";

You'll normally use a static if it's important that only one occurrence of something exists in memory, and you need to do something with a pointer to it. Otherwise you'll normally use a const.


I understand this is an exercise to improve your Rust skills, but taking a step back... Are these sorts of file manipulations just a poor man's templating engine? Using a templating library like tera or handlebars could give you access to more powerful tools, allowing you to make even cooler things! :slightly_smiling_face:


By the looks of it, main.rs is mainly a way to try your code out. If you wanted to give it a bit more polish, you can accept command-line arguments. I personally like structopt because it makes it really easy to declare your program's inputs and have it stored in a struct using the correct data types.

There are a bunch of tutorials on writing command-line tools in Rust, but this one from the CLI workgroup has a lot of neat tips.


// main.rs

use lib::FileManipulator;

fn main() {
    static TEST_FILE: &str = "test-file.ts";
    static TEST_CONTENT: &str = "// Hello \n
console.log( 'Hello, world!' );";
    let mut some_file = FileManipulator::new(TEST_FILE.to_string(), TEST_CONTENT)
        .expect("Could not create empty file");
    some_file.save().expect("Could not save file");

    some_file
        .insert_between(
            "// HELLO ALERT",
            "alert( 'Hello, world!' );",
            Some("// Hello"),
        )
        .expect("Could not modify file");
    some_file.save().expect("Could not save file");
}

Did you know that main() can return a Result<(), Error>? If you do, the compiler will generate code that automatically prints out the error and set an appropriate exit code.

Even though it's functionally identical, it looks a little better to the user because instead of seeing a panic message, the anyhow::Error type will print the error text, the chain of errors (something like "reading the file failed because we couldn't open the file because there was a permission error").

anyhow::Error will even print a backtrace if the RUST_BACKTRACE environment variable is set.

3 Likes

First, thank you! I learned alot from your review.

Very good to get that pattern inn. It had stuck from my first occurrence with Rust when i couldn't pass a PathBuf around (don't remember why).

So true, im gonna remove the Some on inser_after because it was supposed to be the fallback if insert_between could not match.

Thank you, didn't know about that. That will make it much cleaner. :bowing_man:

I guess its because im new to the syntax and therefore i try to make my own interfaces so its faster for me to use it. But when put like that, I see that it is an anti-pattern for my learning. Thanks!

Now i learned that too :bowing_man:

Spot on!!! Templating engine was the right google word i couldent come up with when trying to find an implementation to use :bowing_man:

No, mindblown again :bowing_man:

Again, thank you! :slight_smile:

I'm guessing you've got an OO background? The alternative design I was talking about is a more procedural/functional approach, while the FileManipulator approach feels closer to what I'd write in C#.

I'd like to emphasise that there's nothing wrong with the way you structured your code! It's not an anti-pattern, just different to how I'd write it (and I'm just one random Rustacean on the internet!).

Rust lets you write code in a large variety of styles ("paradigms"). That means if a particular problem lends it self to functional programming (e.g. stateless transformation of data) or object-oriented programming (e.g. a library where users may want to inject their own behaviour to tweak business logic), then you've got the flexibility to write code that way.

I'm guessing you got some sort of borrow checker error, either because you tried to move a borrowed value or tried to use the original variable after it was moved elsewhere.

A PathBuf isn't Copy, so to assign it to another variable you either need to clone() it (copying the underlying heap allocation) or move it (invalidating the previous variable/binding).

1 Like

Correct you are. Thought really appreciated your bump there because it frees me from my old abstractions about this. Also. The fs interface in Rust is easier/more consistent from what i used before (after you have gotten over a few concepts) so that having it be apart of the FileManipulator (BufferManipulator) is not needed anymore (my train of thoughts).

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.