Is this code idiomatic?

I'm learning rust and I've decided to do it by doing algorithms. Is the code below idiomatic rust or (perhaps yes), could be written better?
Thank you.

pub fn longest_common_prefix(strs: Vec<String>) -> String {
    let mut i = 0;
    let first_str = &strs[0];
    'outer: while i < first_str.len() {
        for j in 0..strs.len() {
            match strs[j].chars().nth(i) {
                Some(cx) => {
                    if cx != first_str.chars().nth(i).unwrap() {
                        break 'outer;
                    }
                }
                None => break 'outer,
            }
        }

        i += 1;
    }
    let res: String = first_str[0..i].to_string();
    res
}

Not quite. Here are some issues:

  1. You should accept &[String] rather than a vector, since you have no need to take ownership of it.
  2. You use indexes in your loops.
  3. Calling .chars().nth(i) is expensive because it contains a loop.
  4. You don't handle the empty list case.

This would be better I think:

pub fn longest_common_prefix(strs: &[String]) -> &str {
    if strs.is_empty() {
        return "";
    }

    let first_str = &strs[0];
    
    for (i, character) in first_str.chars().enumerate() {
        for string in strs {
            if string.chars().nth(i) != Some(character) {
                return &first_str[0..i];
            }
        }
    }
    
    // entire string is common prefix
    first_str
}

Though there's still an issue here because &first_str[0..i] is incorrect if the string contains any non-ascii characters, as it indexes by byte offsets.

Hi and thank you for your reply. Yes, your code looks much cleaner. Thanks.
Just one observation:

  1. Calling .chars().nth(i) is expensive because it contains a loop.

Your code also contains that^^^

Yeah, but that one is more annoying to avoid than the other one of them. You'd probably need to find the prefix on the byte-level to properly avoid it.

You can avoid it like this:

pub fn longest_common_prefix(strs: &[String]) -> &str {
    if strs.is_empty() {
        return "";
    }

    let first_str = &strs[0];
    
    for (i, byte) in first_str.as_bytes().iter().enumerate() {
        for string in strs {
            if string.as_bytes().get(i) != Some(byte) {
                
                let mut end = i;
                // guaranteed not to underflow since is_char_boundary(0) is always true
                while !first_str.is_char_boundary(end) {
                    end -= 1;
                }
            
                return &first_str[0..end];
            }
        }
    }
    
    // entire string is common prefix
    first_str
}

This uses is_char_boundary to go a few bytes backwards if the prefix ends in the middle of an unicode character.

Thanks

This code misbehaves if the strings are not ASCII - it may return the wrong prefix or panic. Here's an example (playground):

fn main() {
    let strs = vec!["Префикс: русский".into(), "Префикс: english".into()];
     // expecting string "Префикс: "
    println!("Prefix: \"{}\"", longest_common_prefix(strs.as_slice()));
}

Yeah, I realized. The updated version should behave better on those.

Yes, I've already checked it, it indeed handled the presented case correctly - playground.

For some totally different looking solution:

pub fn longest_common_prefix(strs: &[String]) -> &str {
    match strs.split_first() {
        None => "",
        Some((first, [])) => first,
        Some((first, others)) => first
            .char_indices()
            .find(|&(i, c)| others.iter().any(|s| s[i..].chars().next() != Some(c)))
            .map_or(first, |(i, _)| &first[..i]),
    }
}

Edit: Using slice patterns:

pub fn longest_common_prefix(strs: &[String]) -> &str {
    match strs {
        [] => "",
        [first] => first,
        [first, others @ ..] => first
            .char_indices()
            .find(|&(i, c)| others.iter().any(|s| s[i..].chars().next() != Some(c)))
            .map_or(first, |(i, _)| &first[..i]),
    }
}
3 Likes

Yet another solution, based on the realization that as you progressively iterate through the input string, the common prefix can only get shorter.

It also zip()s together the character iterators of the two strings, since they necessarily have to agree position-by-position if they are to share a prefix. This (hopefully) improves performance and reduces the complexity of the predicate of find(). Playground.

pub fn longest_common_prefix<S: AsRef<str>>(strs: &[S]) -> &str {
    let mut strs = strs.iter();
    let first = match strs.next() {
        None => return "",
        Some(s) => s,
    };
    let mut prefix = first.as_ref();

    for s in strs {
        prefix = prefix
            .char_indices()
            .zip(s.as_ref().chars())
            .find(|((_, c1), c2)| c1 != c2)
            .map_or(prefix, |((i, _), _)| &prefix[..i]);
    }

    prefix
}

This does however decrease performance if the strings are long. E.g. if you have some

["really really long string ending in A",
 "really really long string ending in B",
 "reality"]

Your solution would traverse the first two strings almost entirely while one would actually only need to look at the first 5 characters of each for finding the "real" longest common prefix.

Sure, one could sort the strings by increasing length for amending that, but that would then require additional time and memory for sorting. That's a trade-off, I guess.

Another implementation, just for fun:

pub fn longest_common_prefix(strs: &[String]) -> &str {
    let mut iters: Vec<_> = strs.iter().map(|s| s.char_indices()).collect();
    let exemplar = iters.pop().expect("No strings supplied");
    
    for (idx,c) in exemplar {
        if !iters.iter_mut().all(|it| it.next() == Some((idx,c))) {
            return &strs[0][..idx];
        }
    }
    strs[0].as_str()
}

(Playground)


Edit: a variant that doesn't use heap allocation. This is quite similar to @alice's original solution, but without the nth() call:

pub fn longest_common_prefix(strs: &[String]) -> &str {
    for (idx,c) in strs[0].char_indices() {
        // Because `s[..idx]` represents a common prefix,
        // `idx` must be a valid character boundary in all the strings
        if !strs[1..].iter().all(|s| s[idx..].chars().next() == Some(c)) {
            return &strs[0][..idx];
        }
    }
    strs[0].as_str()
}

(Playground)

1 Like

Sorry to be "that guy", but strings are complicated. Unicode has this complex hierarchy: u8 < char < grapheme cluster < Emoji ZWJ Sequence. Using grapheme clusters will get you most stuff, but will still get things like emoji rainbow flags wrong.

I'm not saying you should support all these things: you can choose what level of the hierarchy you want to support. But it is worth being aware of them.

EDIT That isn't "it" BTW, there is also different encodings (e.g. utf-8), ligatures (although most of the time you can break them apart for your prefix example). And I don't really understand the details of things like Han unification & CJK, Devanagari, etc.

This post was flagged by the community and is temporarily hidden.

Insofar as you consider the literally single-method-call vec.sort_by_key() "difficult looking".

Your tacit accusation of me merely waniting to show off is not appreciated. If you don't want to see solutions, don't ask questions.

I had rather this in mind where I said:
Look how even the simplest algorithm can be made look difficult

Despite your post being flagged by somebody here I have to say I totally sympathize with your point of view there.

What I see there does not even give a hint of being a program in any language I have ever used. Figuring out how it works is going to take some intensive searching of documentation on all the functions/methods it is making use of.

I can't help thinking a couple of simple nested loops would solve the problem far more clearly.

As such, cover me in tomatoes if you like, I don't get the idea of striving for "idiomatic Rust", whatever that may be. Just give me clear code that works.

1 Like

It wouldn't be hard to open-code the zip function and get something more imperative-looking with exactly the same semantics.

for s in strs {
    let mut prefix_i = prefix.char_indices();
    let mut other_c = s.as_ref().chars();
    while let (Some((i, c1)), Some(c2)) = (prefix_i.next(), other_c.next()) {
        if c1 != c2 {
            prefix = &prefix[..i];
            break;
        }
    }
}

But I'm not convinced that this is an improvement. It replaces obscure iterator combinators with obscure pattern matching, and I can't get rid of that without introducing redundant bounds checks.

1 Like