How to split on the basis of ' in Rust

This program is to check the number of occurrences of the word in a string. Every test ran successfully except when the words = "Joe can't tell between 'large' and large." or words = "First: don't laugh. Then: don't cry.", If I get rid of !c.is_alphanumeric() in split closure, then I would have to write every single special character on which the words have to be split.

This is a beginner level exercise on Exercism so I wanted to avoid regex crate.

use std::collections::HashMap;

pub fn word_count(words: &str) -> HashMap<String, u32> {
    
    // let mut indexes: HashMap<char, usize> = [('A', 0),('G', 0),('C', 0),('T', 0) ].iter().cloned().collect();
    let mut indexes: HashMap<String, u32> = HashMap::new();
    let to_lowercase = words.to_lowercase();
    println!("{:?}", to_lowercase.split(|c: char| !c.is_alphanumeric() ).collect::<Vec<&str>>());
    for c in to_lowercase.split(|c: char| !c.is_alphanumeric()).filter(|&x| x!="").collect::<Vec<&str>>(){

    
        let entry = indexes.entry(c.to_string()).or_insert(0);
        *entry += 1;
    };    
   
   indexes
}

Tests

#[test]
#[ignore]
fn test_normalize_case() {
    check_word_count("go Go GO Stop stop", &[("go", 3), ("stop", 2)]);
}

#[test]
#[ignore]
fn with_apostrophes() {
    check_word_count(
        "First: don't laugh. Then: don't cry.",
        &[
            ("first", 1),
            ("don't", 2),
            ("laugh", 1),
            ("then", 1),
            ("cry", 1),
        ],
    );
}

#[test]
#[ignore]
fn with_quotations() {
    check_word_count(
        "Joe can't tell between 'large' and large.",
        &[
            ("joe", 1),
            ("can't", 1),
            ("tell", 1),
            ("between", 1),
            ("large", 2),
            ("and", 1),
        ],
    );
}

Wouldn't split_whitespace() work in this case?

Yes, for most of the cases but not for the cases which would have apostrophe or special chars.

OK, then I'm not understanding the exercise. Since this is a beginner-level task, I'd assume it lets you get away with saying that words are anything separated by whitespace. Can you elaborate as to what the exercise is precisely expecting you to do?

1 Like

After splitting on whitespace you could trim leading and trailing punctuation from each word using:

word.trim_matches(|c: char| !c.is_alphanumeric())

(Note: This returns a slice of the input word; it doesn't mutate its input in-place.)

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.