Minimal setup to swap first two chars in the input &str and return a String where the function signature is fixed

Hi all,

I have been trying to solve an exercism exercise using Rust. The function signature is fixed and I can't change it. The real task is more complex but knowing how to fix the code below will give me insight to solve my issue over at exercism.

I posted the below code to show my intention instead of the entire history of what I tried. I would appreciate it if someone can show me the most idiomatic way to return the text in the_string_to_modify with the first two characters swapped within the constraints of the modify_string function signature.

fn modify_string(the_string_to_modify: &str) -> String
{
    //I tried this but still cryptic error messages
    //let mut working_the_string_to_modify_copy = the_string_to_modify.clone();

    let mut aux = b'0';
    
    //Replaced the_string_to_modify with working_the_string_to_modify_copy
    //but not success in what I have tried
    aux = the_string_to_modify.as_bytes()[0];
    the_string_to_modify.as_bytes()[0] = the_string_to_modify.as_bytes()[1];
    the_string_to_modify.as_bytes()[1] = aux;
    
    the_string_to_modify.to_string()
}

fn main()
{
    let string_to_modify = "abc";
    let modified_string = modify_string(string_to_modify);
    println!("{modified_string}");
}

Modify and the_string_to_modify are misnomers. You cannot modify the contents of &str. What you actually need to do is to build a new String. Something like...

let mut new_string = String::new();
// ...somehow build the string up from the contents of `the_string_to_modify`...

Consider using chars and push. And the consider some optimizations by looking for methods that let you extend the String with a &str or with an iterator of chars.

(Or if you really want I can whip up an answer, but it's a better excercise to try again yourself.)

1 Like

If you're running into trouble with strings, you might like this:

It's my favourite article about them in Rust, cleverly disguised under fizzbuzz.

2 Likes

Another thing to know about Rust strings (String, str) is that they are UTF-8 encoded. That's a variable-width encoding -- what we call a single char may be encoded as 1, 2, 3, or 4 bytes. This also means a Rust string is not an array or vector of chars (those are always four bytes).

Why does this matter? Almost all code exercise and competition problems assume that your text inputs are ASCII.[1] They do this implicitly by having problems around things like Pig Latin or anagrams, which can be implemented relatively trivially in terms of moving bytes around -- but not so trivially when bytes[2] don't correspond to characters.

The best workaround in such scenarios depends on the exact problem, but is usually either working mostly with bytes or working mostly with chars. In real-world situations there are often problems with both approaches, but the simplified problems you find in exercises almost always only give you ASCII, and both workarounds work fine with ASCII.

Expand for more about using char as a workaround.

As it turns out, working with chars as a fixed-width encoding still won't make your Pig Lagin-like programs robust! A char represents what Unicode calls a "scalar value". The problem is that what a human would call a character does not correspond to a scalar value, because different scalar values can be combined.[3]

So if you swap a couple chars, you might break up what looks like a single character to a human into multiple characters which don't make sense, or something that's not even proper text to a human.[4]

In other words, when considering how humans interpret text, Unicode itself is inherently a variable width representation. What humans consider a character (best approximated by a Unicode "grapheme cluster") may consist of an arbitrary number of scalar values.

When attempting to be robust in this domain, you would use crates like unicode_segmentation. It can be tedious and isn't necessarily the best exercise for learning a new programming language; I don't think I've ever seen an online programming exercise that required working properly with grapheme clusters.

But you should know that the pitfalls of treating char like a human character exist.


  1. or at least Latin1 or some sort of fixed width encoding ↩︎

  2. or some other fixed unit length ↩︎

  3. So it's a bummer Rust calls scalar values char. But almost every programming language has similar naming potholes. ↩︎

  4. Once you've finished your function, see what it does to the input here. Or to the example grapheme clusters in the link below. ↩︎

3 Likes