What is the most efficient way to convert `&str` of `HashMap<&str, ()>` to lowercase?

d4h0 · January 29, 2021, 10:18am

Hi,

Let's say we have the following code:

use std::collections::HashMap;

fn main() {
    let src = "FÖÖ BÄR baz foo abc def BÄR";
    let mut case_sensitive = HashMap::new();
    let mut case_insensitive = HashMap::new();
    for x in src.split(' ') {
        case_sensitive.entry(x).or_insert(());
        // Is there anything better than allocating a new `String`?
        let x_i: String = x.to_lowercase();
        case_insensitive.entry(x_i).or_insert(());
    }
    dbg!(case_sensitive);
    dbg!(case_insensitive);
}

Playground

What would be the most efficient way to get a HashMap whose keys are lower-case, so it's possible to do lookups that are case-insensitive?

One idea I had, was to allocate a String to which the case-insensitive values are written, and then lower-case this string:

use std::collections::HashMap;

fn main() {
    let src = "FÖÖ BÄR baz foo abc def BÄR";
    let mut case_sensitive = HashMap::new();
    let mut case_insensitive = String::new();
    for x in src.split(' ') {
        case_sensitive.entry(x).or_insert(());
        case_insensitive.push_str(x);
        case_insensitive.push(' ');
    }
    let case_insensitive = case_insensitive.to_lowercase();
    let case_insensitive: HashMap<&str, ()> = case_insensitive
        .trim_end()
        .split(' ')
        .map(|x| (x, ()))
        .collect();
    dbg!(case_sensitive);
    dbg!(case_insensitive);
}

Playground

This would result in three allocations (which would be fine).

Another approach I thought of, was to use something like SmartString, but I can't find a crate that has such a type that has to_lowercase.

One problem is, that &str can contain non-ASCII characters (without that requirement str::as_bytes + char::make_ascii_lowercase + str::from_utf8 probably could be used).

The input I actually have is cssparser::CowRcStr (which implements Deref<Target = str>), in case that would make a difference.

2e71828 · January 29, 2021, 10:30am

smallstr::SmallString will only make a heap allocation if the held string is too large for its statically-allocated buffer. It implements FromIterator<char>, so you should be able to do something like this to avoid the intermediate String value (untested):

let case_insensitive_key: SmallString<[u8;16]> =
    x.chars().flat_map(char::to_lowercase).collect();

EDIT: Looking at the SmartString docs, this approach would work with it too.

d4h0 · January 29, 2021, 11:07am

@2e71828: Thank you!

I'm curious if there are other significantly different approaches, so I'll keep this as "un-solved" a bit longer.

kornel · January 29, 2021, 11:47am

If you only lowercase for sake of comparison, then the most efficient way is not to convert at all, but wrap them in case-ignorant-comparison wrapper:

d4h0 · January 29, 2021, 12:53pm

@kornel: Nice! This should work for my particular use-case.

Are you sure, this always will be faster (on average), than creating a SmartString with @2e71828's approach (if the string is small enough to not allocate on the heap)? Wouldn't Unicase have to do the case-folding at every comparison?

Cocalus · January 29, 2021, 2:51pm

Another approach If many of the strings are already lower case, you could store a Cow<'_,str> in the lowercase version to only allocate when there's an uppercase character. Since your inputs are kind of a cow type you might be able to use them directly

use std::collections::HashSet;
use cssparser::CowRcStr;

fn main() {
    println!("Hello, world!");

    let src = "FÖÖ BÄR baz foo abc def BÄR";

    let src:Vec<_> = src.split(' ').map(CowRcStr::from).collect();

    let mut case_sensitive = HashSet::new();
    let mut case_insensitive = HashSet::new();
    let mut buf = String::new();

    for x in src {
        buf.push_str(&x);
        buf.make_ascii_lowercase();

        if buf == x.as_ref() {
            // CowRcStr should be very cheap to clone
            case_insensitive.insert(x.clone());
        } else {
            case_insensitive.insert(CowRcStr::from(buf.clone()));
        }

        case_sensitive.insert(x);
        buf.clear();
    }
}

d4h0 · January 29, 2021, 4:46pm

@Cocalus: Thank you, this solution looks interesting too!

d4h0 · January 29, 2021, 4:56pm

I've selected @2e71828's approach as the solution because it answers my question and is a great answer for it, but I decided to use @kornel's approach because it fits my use-case well and is a bit easier to implement (including the remainder of the program), and might even be faster than other solutions.

Thanks everybody!

system · April 29, 2021, 4:57pm

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.

Topic		Replies	Views
Unexpected results of to_lowercase function when applied to HashMap help	7	735	January 12, 2023
Char to_lowercase problem	3	709	July 10, 2021
[SOLVED] (no) Problem mapping charaters of two strings in a hashmap help	6	1309	January 12, 2023
&str vs String for HashMap key help	27	2033	February 8, 2024
Is it not safe to use &String or &str for key of HashMap? help	8	918	September 11, 2023

What is the most efficient way to convert `&str` of `HashMap<&str, ()>` to lowercase?

Related Topics