Is using `clone()` a lot, bad practice?

Kind of a continuation of this: Is it ok to use clone when creating a struct from another one? - #8

I have a struct that I am using in a program I am working on, and one thing I am running into quite frequently is having to use clone() on my struct on each line. Let me show an example:

    pub fn run(self) {
        let mut data = String::new();
        let response = self.clone().validate();
        let request = self.clone().generate(response.clone(), &mut data);
        self.connect(response.clone(), request);
    }

I am still in the process of building this and will be incorporating having to go through a few more for loops of iterated data. So I assume I will end up having to use clone on self a few more times.

Here is the actual struct that I essentially want to live the whole program, and be referenced in different places.

pub struct Properties {
    pub sub_command: String,
    pub word_list: Vec<String>,
    pub http_method: String,
    pub post_data: String,
    pub headers: Vec<String>,
    pub exclude_status: Vec<String>,
    pub exclude_word_count: Vec<String>,
    pub exclude_content_length: Vec<String>,
    pub url: String,
}

Am I using clone(), causing a lot of performance degradation? Would I be better off, using clone() in places where I reference the struct only to get snippets of its data?

        // Get hostname from URL string. Validate if port is connected to the end ":443"
        let protocol_appended = Regex::new(r"(:\d{2,5})").unwrap();
        let mut hostname = String::new();
        if protocol_appended.find(&self.properties.url).is_some() {
            if self.properties.url.find("://").is_some() {
                hostname = self.properties.url.split("://").collect::<Vec<&str>>()[1].split(":").collect::<Vec<&str>>()[0].to_string();
            } else {
                hostname = self.properties.url.split("/").collect::<Vec<&str>>()[0].split(":").collect::<Vec<&str>>()[0].to_string();
            }
        } else {
            if self.properties.url.find("://").is_some() {
                hostname = self.properties.url.split("://").collect::<Vec<&str>>()[1].split("/").collect::<Vec<&str>>()[0].to_string();
            } else {
                hostname = self.properties.url.split("/").collect::<Vec<&str>>()[0].to_string();
            }
        }

Any thoughts greatly appreciated!

First of all, don't use stringified types; you should leverage Rust's type system and use type-driven development. For example, use the Url type from the Url crate instead of String.

There's nothing inherently wrong with cloning. When you have owned types it's the most natural thing to do, but you should also consider giving away ownership if you won't use a value anymore.

5 Likes

That does look like a lot to clone.

Is your struct modified in the course of the program, or do you construct it once at the beginning and it doesn't change after that?

Is there a reason all your methods seem to take self by value? (run, validate, generate...)


Not exactly what you asked, but your adhoc URL parsing has a lot of inefficiencies and assumptions. I suggest using a crate dedicated to the task.

6 Likes

The structure itself is not modified after its initial creation. Its the resulting data of CLAP argument taking in various fields from CLI. Example:

./bin fuzz -u http://google.com/login -w ./test.txt -d "username=TAR1&password=TAR2" -m post

The structure is derived and the values are stored there. They are referenced throughout the code to get the pieces of information out of them, but for the most part are then stored in other variables in order to modify them to suite different things.

That's pretty much why all methods (almost all) take self in a way to access the data.

I was afraid of adding too much bloat from crates, but I think from the two responses I see here, I should focus on using pre-defined types that already exist for things instead of re-inventing the wheel =/ lol.

But the question was why they take self by value,( as opposed to by reference).

2 Likes

Going through your example:

    pub fn run(self) {
        ...
        let response = self.clone().validate();

At first glance, I'm mainly concerned about the validate() here. Why would self contain invalid data at this stage? (Cf. "Parse, don't validate".) Moreover, there's no sign of error handling, so I'm assuming that invalid data is either discarded or somehow replaced with good data (which isn't an operation I'd use the name "validate" for, but that's a bit out of scope). Hence, I'm assuming the reason you're cloning is to avoid mutating self, but run() takes ownership of self, so the mutations wouldn't affect anything in the calling context anyway, and (as I will cover in a bit) it doesn't make sense to use the unvalidated self in subsequent lines — so why not get rid of the clone and just do let response = self.validate();?

Note: Below, I'm going to assume that validate() returns Self; it if returns a different type, then validate() is abolutely the wrong name to give this method.

        let request = self.clone().generate(response.clone(), &mut data);

Now you're cloning self and response because ... why? The only reason to do this is if generate() modifies them, but I don't see why generating a request would do that. Moreover, you're using the original self as the receiver rather than the validated response, which I can only assume means that generate() can potentially be passed invalid data, which seems like a bad idea. Presumably, you'd want to operate solely on the validated response, so let request = response.generate(&mut data);.

        self.connect(response.clone(), request);

Again, self is pre-validation, so I can only assume that it can contain bad data, so why are you trusting it with method calls? If I'm understanding what the code is trying to do, wouldn't response.connect(request); be better?

Even if none of my suggestions above are taken, the clone here is 100% unnecessary, as response is not used later in run(), so there's no need to create another copy. I'd expect clippy to warn about the useless clone here; are you not using clippy?

6 Likes

Then probably a lot of the methods can take &self and reduce the need for cloning.

Or alternatively you could put it in an Arc, which is cheap to clone. Depending on how your program works, your other data structures could hold onto an Arc<Properties> and query it when needed instead of cloning data out of it.

Or alternatively some mix, like putting a large and often-cloned subset of Properties into an Arc etc.

8 Likes

I really appreciate the thorough response. I definitely need to take a step back and plan this out a little better.

You should do as you said. Use a reference to the CLAP argument structure.

I suggest enabling these clippy lints and see if they catch something:

https://rust-lang.github.io/rust-clippy/master/index.html#/needless_pass_by_ref_mut
https://rust-lang.github.io/rust-clippy/master/index.html#/wrong_self_convention
https://rust-lang.github.io/rust-clippy/master/index.html#/unnecessary_mut_passed
https://rust-lang.github.io/rust-clippy/master/index.html#/redundant_clone

2 Likes

I honestly did not even know about this. I will definitely look into getting these clippy lints enabled as I would love to get feedback, especially when I am doing things unnecessarily. Thank you for this!

1 Like

I believe that some of the cloning is impossible to reduce, so using Arc to get the best performance I can would be the best option here. I do appreciate your insights. I have been looking into Arc as I'm still wet behind the ears with Rust.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.