Zero cost abstraction for name generation?

I want to do simple thing: get as input list of almost unique strings (all strings should be unique, except “_”) and output list of unique String.
But the problem that language force me to do unnecessary allocations.
I can not use just HashSet<&str> this is code impossible to compile,
I have to use HashSet<String> for checking that string is unique.
Is any way to prevent memory allocation for String duplicates (take into consideration that I do not want change generate_names signature) ?

use std::collections::HashSet;

fn main() {
    let mut names: Vec<String> = vec!["_".into(), "second".into(), "_".into()];
    generate_names(&mut names).unwrap();
}

fn generate_names(names: &mut [String]) -> Result<(), String> {
    let mut known_names = HashSet::<&str>::with_capacity(names.len());
    for name in names.iter() {
        if name != "_" {
            if known_names.contains(name.as_str()) {
                return Err(format!("duplicate name {}", name));
            }
            known_names.insert(name.as_str());
        }
    }
    for (i, name) in names.iter_mut().enumerate() {
        if name == "_" {
            let templ = format!("x{}", i);
            let new_name = new_unique_name(&known_names, &templ);
            *name = new_name;
            known_names.insert(name.as_str());
        }
    }
    Ok(())
}

fn new_unique_name(names: &HashSet<&str>, templ: &str) -> String {
    unimplemented!();
}

Will this work:

let mut names: Vec<String> = vec!["_".into(), "second".into(), "_".into()];
names.sort_unstable().dedup();

No, I need preserve place of unique names in input list.
In other words I need almost exactly what in Rust code,
scan input array, and replace placehodlers with unique names, preserve another strings on their places.

You need to iterate only once on the slice, so you need to also keep a list of mutable references to your placeholders:

3 Likes

But your code still do extra allocation:

let mut placeholders = Vec::with_capacity(names.len());

Plus as result it adds data to HashSet, while it should be in names after end of function

In my code I always modify String before inserting into HasSet,
but have no idea how convince rustc that I do the right thing.

The suggested code works as you intended

Yes, it allocates just one vector of pointers, it’s still quite cheap compared to the hashset and format! calls; but I agree that the vec should not be strictly necessary, algorithm-wise, it’s a trick to circumvent limitations regarding the sometimes bad granularity of borrows: if you use two loops, Rust does not see that the items you work on are disjoint.

With quite a bit of effort to do it soundly (Miri does not complain), I have managed to no longer need the extra vec of mutable references. However, it involves complex (and thus slightly likely to be unsound) unsafe code, just to save one vector allocation; use at your own risk:

4 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.