Acronym Function - toy problem

Hello, thanks for looking at this little toy problem. I'm learning Rust and found an exercise to make an acronym function. Here is my solution:

pub fn abbreviate(phrase: &str) -> String {
    let split_chars = [' ', ',', ':', '_', '.', '-'];
    phrase
        .split(&split_chars)
        .filter(|word| !word.is_empty())
        .flat_map(|word| {
            word.chars()
                .zip(word.chars().cycle().skip(1))
                .enumerate()
                .filter_map(|(i, (c, c_copy))| {
                    (i == 0 || (c.is_uppercase() && c_copy.is_lowercase()))
                        .then(|| c.to_ascii_uppercase())
                })
        })
        .collect::<String>()
}

The function name, input, and output must stay the same, but I'd like to hear any other suggestions people may have on how to make the solution better.

Thanks!

I know you said don't change the output, but are you sure it works 100% correct? Asking because it seems a little odd:

  • Words that start and end with uppercase don't include the final uppercase, but
  • Words that start with lowercase and end in uppercase include both;
  • You're checking Unicode case properties but using to_ascii_uppercase
    • So e.g. "δaΔa" becomes "δΔ" not "ΔΔ"
3 Likes

Hello and thanks for your comment.

Regarding your points:

  • I had not thought of words that start with lowercase but end in uppercase. Thank you for pointing out that the cycle would mean that in a word like "mRNA", only the 'm' and 'A' would be kept.
    My first instinct is to add another test to filter_map excluding this case. Perhaps by enumerating the c_copy iterator and adding a test for i < i_copy, like below:
pub fn abbreviate(phrase: &str) -> String {
    let split_chars = [' ', ',', ':', '_', '.', '-'];
    phrase
        .split(&split_chars)
        .filter(|word| !word.is_empty())
        .flat_map(|word| {
            word.chars()
                .zip(word.chars().enumerate().cycle().skip(1))
                .enumerate()
                .filter_map(|(i, (c, (i_copy, c_copy)))| {
                    (i == 0 || (c.is_ascii_uppercase() && c_copy.is_ascii_lowercase() && i < i_copy))
                        .then(|| c.to_ascii_uppercase())
                })
        })
        .collect::<String>()
}
  • Regarding the Unicode case properties - at first I tried to work with Unicode, but ran into trouble when I wanted to collect the return value of to_uppercase into a String. I peeked at all the test cases offered for the exercise and saw that they were all ascii, so I'll stick with the ascii version as above.

I looked a little into converting between Unicode and ascii, and it looks like a bit of a rabbit hole. Are there any good resources you know of which outline how to deal with things like this (from the to_uppercase docs):

assert_eq!('ß'.to_uppercase().to_string(), "SS");

You are running a bunch of unnecessary rounds (e.g. the filter() and the final turbofish), and the code is not well-readable due to the over-use of bool combinators and parentheses. Getting Unicode support is trivial by inserting a call to flatten(). I would write it like this:

pub fn abbreviate(phrase: &str) -> String {
    phrase
        .split(&[' ', ',', ':', '_', '.', '-'])
        .flat_map(|word| {
            word.chars()
                .zip(word.chars().skip(1))
                .enumerate()
                .filter_map(|(i, (curr, next))| {
                    if i == 0 || curr.is_uppercase() && next.is_lowercase() {
                        Some(curr.to_uppercase())
                    } else {
                        None
                    }
                })
        })
        .flatten()
        .collect()
}

Hello again and thank you for your advice. As you pointed out, the first filter is unnecessary, so I removed it. Thanks!

Unfortunately the cycle is necessary to pass tests where one of the words which needs to be abbreviated is only one character long, due to skip(1) returning None (I think?):

assert_eq!(
    acronym::abbreviate("Something - I made up from thin air"),
    "SIMUFTA"
);

Using your code above:

thread 'consecutive_delimiters' panicked at 'assertion failed: `(left == right)`
  left: `"SMUFTA"`,
 right: `"SIMUFTA"`', tests/acronym.rs:68:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The problem of using Unicode comes up in cases like this, which using your code above results in:

"ßaby ßoomer" becomes: SSSS

This is because

assert_eq!('ß'.to_uppercase().to_string(), "SS");

Because I only really need to pass ascii tests for the exercise, I'll leave working with Unicode values to a later time, though I would certainly like to see a good resource on how to handle cases like those above.

I hope there are other ways to improve my code because it does seem very clunky, especially when trying to deal with the case of a word which starts with lowercase but ends with uppercase.

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.