I am working through the Rust book and there is a small exercise:
Convert strings to pig latin. The first consonant of each word is moved to the end of the word and ay is added, so first becomes irst-fay . Words that start with a vowel have hay added to the end instead (apple becomes apple-hay ).
I can only solve this with complicated case distinctions and helper functions. Is there an elegant solution? Thanks!
Presumably you're referring to this section from the Rust Book:
If you look at the preceding content on how to use Rust String types, you will notice that it's typical to create a new String instance when changing the contents of an old String, e.g.
I would suggest doing something like this. You are presumably thinking of editing the existing String in-place. While this is technically possible to do in Rust, it's not appropriate for the task at hand, as the resulting Pig Latin is longer than the original string.
Probably not a hugely elegant solution, since Rust Strings are Unicode based (utf8) and it's an ASCII-esque exercise. There's no "vowel" Unicode property, so you'll have to do your own match or so to detect those.
thanks. I have one question, in my solution why can I not use a reference to char in for (i, c) in s.chars().enumerate()?
fn is_vowel(c:char) -> bool {
c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u'
}
fn pig_latin(s:&String) -> String{
// why is a reference not valid here for c?
for (i, c) in s.chars().enumerate() {
if i == 0 && is_vowel(c) {
return String::from(s) + "-hay";
}
let res = String::from(&s[..i]) + &s[i+1..];
let suffix = String::from(c) + "ay";
return format!("{res}-{suffix}");
}
String::from("-")
}
You mean, why is c a char and not a &char? The chars() Iterator returns owned chars, not borrows. It can't return borrows, because a String is UTF-8 encoded (not a Vec<char> or the like). A char is always a 32-bit value, but its UTF-8 encoding could be 1, 2, 3, or 4 bytes.
It's inherently a problem that leads itself to edge cases and helper functions, so I wouldn't be too concerned.
My attempt at directly writing the code in one function is here: Rust Playground
It isn't all that pretty! The example input also shows some of the edge cases that would be even worse to handle in the output properly like capitalization and single-character words.
(Edit: Just realized all the first char logic could be done by str::strip_prefix()! That pretties it up quite a bit - I'll leave it as an exercise for the reader though)
You could also write this logic as a loop over input.char_indices() and a state machine of what you've seen, e.g.:
and writing to output when the state changes or even emit each character as you see them (which I think would need a bit more book-keeping?)
It's an interesting exercise as it's not really directly supported by most of the existing standard string methods, while in e.g. JS you could directly implement this in a single input.replace(/\w+/g, (_, word) => ...) call.
It's an interesting exercise because there are multiple ways to tackle it, either iteratively or not, and chars are more difficult to handle than in most other languages where they're simply bytes.
Which is a great sign you might be under-utilizing some of the std built-in tools. Did you have a chance to go through the docs for the Iterator yet? The different kinds of split's for the &str itself? let/else statements? Never make things overly clever and fancy on first try, either: it clearly says to convert "strings" and not "all valid UTF-8 sentences" there, for instance.
These sorts of exercises are most fun when you set yourself a clear objective, such as:
maximum readability (regardless of the line count)
fewest lines possible (+ poorest readability you can hit) [1]
maximum performance (no calls to String::from or into)
Lastly, don't forget that "elegance" is always in the eye of the beholder. There are standards and guidelines, but these never exist in a vacuum. Poke around the edges, see what you can find.
see how much you can figure out here, for instance ↩︎
I took a poke at the str methods for my attempt, and it's surprisingly tricky to apply them to this problem! I'm sure someone could find something clever, but all the obvious suspects only work with restricted input (eg if you only have space separated alphabetic words you could use split_whitespace(), but any punctuation pretty much breaks it)
I forgot to mention that earlier: in your code above, your function won't work with UTF-8 characters that are longer than 1 byte (which I think is what @simonbuchan is alluding to). For example, try pig_latin(&"élan".to_string()) (which is the French for elk).
That's because you're using the index i in the slices s[..i] and s[i+1..]. You can only split a string on character boundaries, and "é".len() is 2. If you want to take slices like that, you'll need to know at least the index of the character where you split.
PS: On a side note, you could use a &str parameter in your function instead of &String, which relaxes the requirement a little and lets you do pig_latin("élan") or let s = "apple"; pig_latin(&s); without the need to create a String. But it will have to return a String, of course.
That's the tricky thing about arbitrary input, isn't it? You can never know in advance how un/restricted it might get. Given the nature of the task, though - I doubt there's much point in bringing out the heavy guns into what is otherwise little more than an introduction to match ... and the like. Else there'll be nothing but a giant list of petty edge cases to account for.
Yeah, built-in utilities of that sort definitely have their time and place. Rust's regex doesn't overcomplicate things that much, but it definitely takes some getting used to.
It's just for information; normally you don't need to decode that yourself. For this exercise, you only need to know the length of the first char if you split the string. There are other ways to avoid it by iterating on the chars.
In my stab at it, I used char::len_utf8() but it feels a bit grimy.
Slightly nicer are things like the str::char_indices() iterator that will hand out the byte index of each character with that character, and a bunch of other helpers to deal with getting valid indices: but ideally you avoid needing to deal with indices at all, and can build up a more pure description of what you want to happen instead; unfortunately the standard library doesn't have those pieces in place (yet?), which is where in a real project you might reach out for a crate (other people's code packaged up, if you haven't seen that yet)
I think using itertools's chunk_by and dealing only with chars, the code can get really nice:
for (is_word, mut chars) in &input.chars().chunk_by(|c| c.is_alphabetic()) {
if is_word {
let first_char = chars.next().expect("word has at least one char");
match first_char {
'a' | 'A' | 'e' | 'E' | 'i' | 'I' | 'o' | 'O' | 'u' | 'U' => {
// words starting in a vowel just have "-hay" added
output.push(first_char);
output.extend(chars);
output.push_str("-hay");
}
_ => {
// otherwise move the first character to the end with "ay"
output.extend(chars);
output.push('-');
output.push(first_char);
output.push_str("ay");
}
}
} else {
output.extend(chars)
}
}