Avoiding clones

Let's say I have the following two maps (code_map and key_map):

use std::collections::HashMap;

#[derive(Clone, Debug)]
struct Foo {
    code: String,
    key: Option<String>,
}

fn main() {

    let foos = Vec::from([
        Foo { code: "code1".to_owned(), key: Some("key1".to_owned()) },
        Foo { code: "code2".to_owned(), key: None },
    ]);

    let code_map: HashMap<String, Foo> = foos.clone().into_iter()
        .map(|x| (x.code.clone(), x))
        .collect();

    let key_map: HashMap<String, Foo> = foos.into_iter()
        .filter_map(|x| x.key.clone().map(|e| (e, x)))
        .collect();

    println!("{code_map:?}");
    println!("{key_map:?}")

}

Source

There are quite a few clones involved. One way to avoid this would be the use of Rc as here:

use std::collections::HashMap;
use std::rc::Rc;

#[derive(Clone, Debug)]
struct Foo {
    code: String,
    key: Option<String>,
}

fn main() {

    let foos = Vec::from([
        Rc::new(Foo { code: "code1".to_owned(), key: Some("key1".to_owned()) }),
        Rc::new(Foo { code: "code2".to_owned(), key: None }),
    ]);

    let code_map: HashMap<String, Rc<Foo>> = foos.iter()
        .map(|x| (x.code.clone(), x.clone()))
        .collect();

    let key_map: HashMap<String, Rc<Foo>> = foos.iter()
        .filter_map(|x| x.key.clone().map(|e| (e, x.clone())))
        .collect();

    println!("{code_map:?}");
    println!("{key_map:?}")

}

Source

Is this the idiomatic way to go or is there a better alternative?

1 Like

Yes, Rc is a reasonable option here, presuming that you do not need to mutate the Foos after they enter the map.

You can also replace the Strings in Foo and the maps with Rc<str> in order to avoid allocating a separate copy of every string just for the map keys.

4 Likes

Thanks!

Is it common in production code to have nested Rc's throughout the datastructures?

It depends greatly on the structure and usage pattern of those data structures.

The primary consideration is that it is useful to avoid Rc around some part that is to be mutated (rather than replaced entirely), since then one has to deal with the constraints of shared mutability (such as an Rc<RefCell<Foo>>. In that case, one might instead arrange so that the maps instead hold some index or key value identifying the Foo, rather than storing Rc<Foo> in the maps directly. That way, each Foo remains singly owned (from the perspective of ownership and borrow checking) and can be mutated through ordinary &mut access.

One might also avoid Rc in the same way with the intent of reducing the total number of allocations; how much this matters depends on the number of Foos and how many allocations are involved in each one anyway. But you should never try to tweak your program to be faster without first having noticed that it is slow, and second, measured how it is slow (profiling and benchmarking).

3 Likes

Thanks, makes sense :slight_smile:

Would maps of references work here?

let foos = Vec::from([
    Foo { code: "code1".to_owned(), key: Some("key1".to_owned()) },
    Foo { code: "code2".to_owned(), key: None },
]);

let code_map: HashMap<&String, &Foo> = (&foos).into_iter()
    .map(|x| (&x.code, x))
    .collect();

let key_map: HashMap<&String, &Foo> = (&foos).into_iter()
    .filter_map(|x| x.key.as_ref().map(|e| (e, x)))
    .collect();

println!("{code_map:?}");
println!("{key_map:?}")

No, you can't use references that point to other parts of the structure that owns the references. This is both because the language contains no way to express the lifetime of those references, and if it could, it would still be difficult to express that an entry and its reference may be removed together, but an entry must not be removed without first removing its references.

This is called the "self-referential struct" problem.

Oops, I did not notice there is no struct involved here. With local variables only, this can work, as long as you do not ever need to remove items from the vector. Whether this is adequate or impossible depends entirely on the application.

1 Like

Actually there is a struct involved in the real code. I tried to make use of @undefined.behavior's suggestion:

use std::collections::HashMap;

#[derive(Clone)]
struct Foo {
    code: String,
    key: Option<String>,
}

struct Context<'a> {
    codes: HashMap<&'a String, &'a Foo>,
    keys: HashMap<&'a String, &'a Foo>,
}

fn get_ctx<'a>() -> Context<'a> {
    let foos = Vec::from([
        Foo { code: "code1".to_owned(), key: Some("key1".to_owned()) },
        Foo { code: "code2".to_owned(), key: None },
    ]);

    let code_map: HashMap<&String, &Foo> = (&foos)
        .iter()
        .map(|x| (&x.code, x))
        .collect();

    let key_map: HashMap<&String, &Foo> = (&foos)
        .into_iter()
        .filter_map(|x| x.key.as_ref().map(|k| (k, x)))
        .collect();

    let ctx = Context {
        codes: code_map,
        keys: key_map,
    };

    ctx
}

pub fn main() {
    let ctx = get_ctx();

}

Source

This fails compiling with error[E0515]: cannot return value referencing local variable 'foos'.

That case isn't even self-referential; after the function returns, there is no owner of foos at all. When you want a data structure with that arrangement of pointers, the pointers must be Rc (or Arc) and not &.

4 Likes

Yeah, you'll need to make sure that foos outlive Context: you can't put it into the Context, as it would create self-references, but if foos lives somewhere else and passed in as a parameter it should work.

It all depends on what lifetimes you have for the derived maps, and when/if the vector gets updated.

4 Likes

Makes sense, thanks again :+1:

Ftfy: :stuck_out_tongue:

3 Likes

By the way, .clone().into_iter() is better done as .iter().cloned() to avoid allocating a new Vec.

4 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.