Good / Needs Work? Atbash Cipher for Exercism

Hello Rust people, I'm implementing the Atbash Cipher for my Exercism Rust track. While my submission passes all tests, there is an 8 day waiting time for mentor feedback. I hope that by posting my submission here someone with more experience than I have might take a look and offer some advice to make it better.

I come from a little experience in C, and using &str and String presents some challenges for me. It seemed so much simpler to think of strings as arrays of characters, so manipulating the Rust string types is difficult for me. The C approach guided my implementation, but it feels a little round-about.

I am curious whether I am going about implementing the Atbash cipher here in a sensible Rust way, and what I might want to change to improve the implementation.

The exercise involves getting an input &str and outputting a String, with each character changed as per the atbash cipher, adding a space every 5 characters. Included is also a decode function. This all goes in a lib.rs. Here is the code:

// "Encipher" with the Atbash cipher.
pub fn encode(plain: &str) -> String {
    let mut coded: String = plain.to_string();

    coded.retain(|c| c.is_ascii_alphanumeric());
    coded.make_ascii_lowercase();

    let coded_no_spacing = String::from_utf8(
        coded
            .bytes()
            .map(|c| {
                if c.is_ascii_alphabetic() {
                    122 - c + 97
                } else {
                    c
                }
            })
            .collect(),
    )
    .unwrap();

    spacer(coded_no_spacing)
}

/// "Decipher" with the Atbash cipher.
pub fn decode(cipher: &str) -> String {
    let mut out = encode(cipher);
    out.retain(|c| c.is_ascii_alphanumeric());
    out
}

fn spacer(coded_no_spacing: String) -> String {
    let mut coded_no_spacing = coded_no_spacing.chars();

    let mut temp_char = coded_no_spacing.next();
    let mut counter = 0;
    let mut coded_with_spaces = "".to_string();
    while temp_char.is_some() {
        if counter % 5 == 0 && counter != 0 {
            coded_with_spaces.push(' ');
        }
        coded_with_spaces.push(temp_char.unwrap());
        temp_char = coded_no_spacing.next();
        counter += 1;
    }
    coded_with_spaces
}

It looks good! Strings and &str in rust can be a bit more verbose and complicated than in other languages for sure. You definatly have the right idea treating them as [char/byte].

let coded_no_spacing = coded.bytes()
        .map(|c| {
            if c.is_ascii_alphabetic() {
                (122 - c + 97) as char
            } else {
                c as char
        })
        .collect::<String>();


// `enumerate` gives (index, item)
let mut coded_no_spacing = coded_no_spacing.chars().enumerate();
let mut coded_with_spaces = String::new();
while let Some((counter, ch)) = coded_no_spacing.next() {
    if counter % 5 == 0 && counter != 0 {
        coded_with_spaces.push(' ');
    }
    coded_with_spaces.push(ch);
}

Mostly just style sorta stuff, it took me awhile to get comfortable with the tools provided by the std lib or even to know what tools were available. Good luck!

playground

2 Likes

Unless there is some explicit requirement to handle utf8, I'd work with Vec<u8> and &[u8] rather than String and &str, since those seem to be the types that you're actually needing. Even if the problem requires you to use &str in your API, I'd translate to u8 internally.

This looks like a job for chunks. One of the biggest challenges with learning rust is discovering all the wonderful functions present in the standard library, which isn't helped by the fact that they are so general, so you have to look for iterator methods and slice methods, as well as Vec or String methods.

3 Likes

Hey thanks for all the great tips! Here is what I like about your changes:

  • as char and .collect::<String>() to get the logic of my code recognized by the rust compiler, avoiding the unwrap, which wouldn't have panicked in the first place due to the logic of the code. This is my favorite change you made - I didn't know collect could work with types like that.
  • enumerate to avoid having to declare another variable as counter - much cleaner.
  • while let is just cleaner and allows for the temporary variables counter and ch to be used in a more streamlined way.

Thanks so much for looking at this! It is taking me a while to get used to the way rust makes you think - too often it seems like I'm just trying to translate Python or C into rust. This is exactly the kind of feedback that can help me get better at approaching things in a rusty way.

Ever think about becoming a mentor on Exercism?

2 Likes

I also posted this to codereview.stackoverflow.com here, and got some more helpful suggestions that might be improvements on the great suggestions above. Do people think this is an improvement or is it worse?

Here is the code:

/// "Encipher" with the Atbash cipher.
pub fn encode(plain: &str) -> String {
    let coded_no_spacing = plain.chars()
            .filter_map(|c| {
                if c.is_ascii_alphabetic() {
                    let letter = c.to_ascii_lowercase() as u8;
                    Some(char::from(b'z' - letter + b'a'))
                } else if c.is_ascii_alphanumeric() {
                    Some(c)
                } else {
                    None
                }
            })
            .collect();

    spacer(coded_no_spacing)
}

/// "Decipher" with the Atbash cipher.
pub fn decode(cipher: &str) -> String {
    let mut out = encode(cipher);
    out.retain(|c| c.is_ascii_alphanumeric());
    out
}

/// Spacer adds one space every five characters to help encode function.
fn spacer(coded_no_spacing: String) -> String {
    let mut coded_with_spaces = String::with_capacity(
        coded_no_spacing.len() + coded_no_spacing.len() / 5);

    for (index, ch) in coded_no_spacing.chars().enumerate() {
        if index % 5 == 0 && index != 0 {
            coded_with_spaces.push(' ');
        }
        coded_with_spaces.push(ch);
    }

    coded_with_spaces
}

Two thoughts:

  1. You are allocating to Strings potentially prematurely. Keep doing things as iterators until you need to collect to the final string. (HINT: you might do well to make a helper function that takes an &str and returns an iterator.)
  2. Good idea to use encode in the decode function, but because your encode function inserts spaces, you are essentially allocating a string, iterating over it to insert spaces and allocate a new string, then removing the spaces from that string. Maybe refactor the common part out to a helper function so that you can use that in both encode and decode but then only insert spaces for encode.

Hey good idea to refactor to put the common part of encode and decode into a helper function and only add the spacer function into encode. Not sure what you mean by your first point - could you give me another hint or be more specific?

Refactored code below:

/// "Encipher" with the Atbash cipher.
pub fn encode(plain: &str) -> String {
    let coded_no_spacing = atbash_no_spacing(plain);

    spacer(coded_no_spacing)
}

/// "Decipher" with the Atbash cipher.
pub fn decode(cipher: &str) -> String {
    atbash_no_spacing(cipher)
}

/// Atbash cipher uses the same change in text both encoding and decoding.
/// This helper function does the Atbash change without spacing changes.
fn atbash_no_spacing(to_atbash: &str) -> String {
        let coded_no_spacing = to_atbash.chars()
            .filter_map(|c| {
                if c.is_ascii_alphabetic() {
                    let letter = c.to_ascii_lowercase() as u8;
                    Some(char::from(b'z' - letter + b'a'))
                } else if c.is_ascii_alphanumeric() {
                    Some(c)
                } else {
                    None
                }
            })
            .collect();

    coded_no_spacing
}

/// Spacer adds one space every five characters to help encode function.
fn spacer(coded_no_spacing: String) -> String {
    let mut coded_with_spaces = String::with_capacity(
        coded_no_spacing.len() + coded_no_spacing.len() / 5);

    for (index, ch) in coded_no_spacing.chars().enumerate() {
        if index % 5 == 0 && index != 0 {
            coded_with_spaces.push(' ');
        }
        coded_with_spaces.push(ch);
    }

    coded_with_spaces
}

So your atbash_no_spacing function creates a variable equal to the output of some expression and then returns that variable. First suggestion would be to not create a variable and just return the output of your expression.
The second suggestion would be to not end that expression with a collect().

fn atbash_no_spacing<'a>(to_atbash: &'a str) -> impl Iterator<Item = char> + 'a {
    to_atbash.chars()
            .filter_map(|c| {
                if c.is_ascii_alphabetic() {
                    let letter = c.to_ascii_lowercase() as u8;
                    Some(char::from(b'z' - letter + b'a'))
                } else if c.is_ascii_alphanumeric() {
                    Some(c)
                } else {
                    None
                }
            })
}

Now your decode is atbash_no_spacing(cipher).collect() and your encode is spacer(atbash_no_spacing(plain)) if you change your spacer function to take an iterator and return a String.

4 Likes

Ooh that sounds neat! I'm definitely very new and out of my depth with the impl Iterator<Item = char> + 'a stuff you've got going on there, but the overall logic makes sense. No need to get a String from atbash_no_spacing only to make it an Iterator again in spacer and back to a String.

I suppose it is more optimized that way (fewer trips to the heap?) and just cleaner, but I'll need to understand lifetimes and iterators and how to put iterators into functions to use it - off to doc.rust-lang.org.

1 Like

The specific syntax is tricky, but you can get the idea from the compiler. If you remove the collect() call but leave the function signature as expecting to return String the compiler error message will tell you

expected type std::string::String
found type std::iter::FilterMap<std::str::Chars<'_>, [closure@src/lib.rs:3:25: 12:14]>

Now you just have to figure out how to get that FilterMap type in the return signature. Anything that implements iterator with the item of type Char would be able to call a filter_map with an appropriate closure. Thus impl Iterator<Item = char>.

OK I've gotten it to work. Looks like it is more simple to write a function which takes an Iterator as a parameter than return one. I'll have to look more closely at the syntax around what you wrote above in this function:

fn atbash_no_spacing<'a>(to_atbash: &'a str) -> impl Iterator<Item = char> + 'a

What confuses me now is why it is necessary to specify these lifetimes, which look as though they would be inferred. But I can't get it to compile without the lifetimes. However it is not necessary to use explicit lifetimes when passing an Iterator to a function as parameter. As with the spacer function:

fn spacer(coded_no_spacing: impl Iterator<Item = char>) -> String

Updated code:

/// "Encipher" with the Atbash cipher.
pub fn encode(plain: &str) -> String {
    spacer(atbash_no_spacing(plain))
}

/// "Decipher" with the Atbash cipher.
pub fn decode(cipher: &str) -> String {
    atbash_no_spacing(cipher).collect()
}

/// Spacer adds one space every five characters to help encode function.
fn spacer(coded_no_spacing: impl Iterator<Item = char>) -> String
{
    let mut coded_with_spaces = String::new();

    for (index, ch) in coded_no_spacing.enumerate() {
        if index % 5 == 0 && index != 0 {
            coded_with_spaces.push(' ');
        }
        coded_with_spaces.push(ch);
    }

    coded_with_spaces
}

/// Atbash cipher uses the same change in text both encoding and decoding.
/// This helper function does the Atbash change without spacing changes.
fn atbash_no_spacing<'a>(to_atbash: &'a str) -> impl Iterator<Item = char> + 'a {
    to_atbash.chars()
        .filter_map(|c| {
            if c.is_ascii_alphabetic() {
                let letter = c.to_ascii_lowercase() as u8;
                Some(char::from(b'z' - letter + b'a'))
            } else if c.is_ascii_alphanumeric() {
                Some(c)
            } else {
                None
            }
        })
}
1 Like

The difference is not the iterator as parameter versus return type, but rather the function returning a reference versus an owned String.
See what happens if you have the spacer function return &str:
https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=a9704961b5075424132d710341eefb57

I just tried it by making a main in the link you shared and changing encode to return &str but got an error. Is this what you meant me to try?

Here is the link to playground with spacer2 and the main to test, and below is the error.

error[E0106]: missing lifetime specifier
  --> src/main.rs:32:61
   |
32 | fn spacer2(coded_no_spacing: impl Iterator<Item = char>) -> &str
   |                                                             ^ help: consider giving it an explicit bounded or 'static lifetime: `&'static`
   |
   = help: this function's return type contains a borrowed value with an elided lifetime, but the lifetime cannot be derived from the arguments

Yes. When the return type is a reference (&str in this case) the compiler needs to know the lifetime for that reference. But when the return type is a String you don't need a lifetime.

Note this is actually a bad example b/c even if you specify the lifetime the compiler will complain that you are trying to return a reference to a value owned by the function. I'm just trying to make the point that you could have an Iterator as a parameter and still need to designate lifetime depending on the return type.