Unexpected results of to_lowercase function when applied to HashMap


#1

Dear Rustaceans,

I am trying to make a program that reads a string of characters, divides it into words based on some delimiter and then performs to_lowercase() function to each word and add it to a HashMap object with its corresponding count in the string.

Below you could see a part of the program I have trouble with:

'use std::collections::HashMap;

fn main() {
// let buf = “тест Тест test tEst TESt”;
let buf = “doG dOG DOg DOG собАка сОБАкА”;

let mut words: HashMap<&str, usize> = HashMap::new();

for word in buf.split(is_delimiter) {
    let lowercase = word.to_lowercase();
    let lower = &*lowercase;
    print!("{}, ", lower);
    if let Some(count) = words.get_mut(lower) {
        *count += 1;
        continue;
    }
    
    words.insert(word, 1);
}

println!("{:?}", words);

}

fn is_delimiter(c: char) -> bool {
let delimiter = [
’ ‘,
’\t’,
’\n’,
];
for i in delimiter.iter() {
if c == *i {
return true;
}
}
false
}’

If you run the program it provides the expected result - all test words (both Latin and Cyrillic) are converted to lowercase and HashMap counts correctly 2 times Cyrillic and 3 times Latin version of test.

However, when I try to run it with the second buf - with dogs (commenting out the first one), the result of lower variable is the expected one, however, when using the get_mut() method it puts the original case of the words into the HashMaps.

I have also tested this on larger pieces of text and sometimes it adds words correctly, while sometimes - not and I cannot see the exact reason for that.

Hope you can help me out.


#2

Hm, shouldn’t you insert the lower-cased version of the word into the map here ?


#3

Which then simplifies the loop body to

let lowercase = word.to_lowercase();
*words.entry(lowercase).or_insert(0) += 1;

#4

Hey,

Thank you for pointing this out.

Indeed this is a mistake I have made but still cannot figure out how to make it work properly due to borrow checker errors.

Following below advise for using

let lowercase = word.to_lowercase();
*words.entry(lowercase).or_insert(0) += 1;

compilator throws errors for mismatched types and requires a reference & to be put:

Option 1:

`let lowercase = &word.to_lowercase();
*words.entry(lowercase).or_insert(0) += 1;’

temporary value needs to live until here -> end of main()

Option 2:

`let lowercase = word.to_lowercase();
*words.entry(&lowercase).or_insert(0) += 1;’

borrowed value (lowercase) needs to live until here -> end of main()

I see my issue is really similar to https://www.reddit.com/r/rust/comments/742ww8/borrowed_value_needs_to_live_until_here_why/ but cannot figure out how to get rid of the borrow error.


#5

i think you want to have a HashMap<String, usize> since a ref to lowercase is only valid for one round of the loop.


#6

Sorry, I overlooked your hashmap declaration as <&str, usize>. As @juggle-tux said, you need to make it <String, usize> for this to work at all, since to_lowercase crates a new string which lives only during the the for loop iteration block.


#7

Thank you for the comments - using HashMap<String, usize> solved the issue.