Which implementation is faster and why?

hello rustaceans

during my first month of learning rust, i came to a conclusion that rust is doing many things differently than other similar low level languages, so that's rises many questions that i would like to ask here and hopefully find answers to.

let's assume that a have a big chunk of bytes,e.g. a large book, and i would like to exclude all words that contains vowels in them, except for the vowel 'a'.

in rust which implementation is faster, and here i am only concerned about speed, readability (not really something i am worried about) as long as its fast its good, so i came up with this example which look similar to something i would do in c

Playgound

fn main() {
    let bytes = b"you can search throughout the entire universe for someone who is more deserving of your love and affection than you are yourself, and that person is not to be found anywhere. You yourself, as much as anybody in the entire universe deserve your love and affection";


    // only allow words that have 'a' vowel
    let deny = [
        false, false, false, false, true, 
        false, false, false, true, false, 
        false, false, false, false, true, 
        false, false, false, false, false, 
        true, false, false, false, true, 
        false,
    ];
    
    let mut denied = false;
    let mut start_idx = 0;
    let mut end_idx = 0;
    
    // For the sake of keeping the example simple i will ignore covering the space on the first
    // word and the last word in the string..
    
    for b in bytes.iter() {
        match *b {
            b'A'..=b'Z' => {
                if deny[*b as usize - 65] {
                    denied = true;
                }
            },
            b'a'..=b'z' => {
                if deny[*b as usize - 97] {
                    denied = true;
                }
            },
            b' ' => {
                if !denied {
                    let word;
                    
                    unsafe {
                        word = String::from_utf8_unchecked(bytes[start_idx..end_idx].to_vec());
                    }
                    
                    println!("{}", word);
                }
            
                denied = false;
                start_idx = end_idx + 1;
            },
            _ => { }
        }
        
        end_idx += 1;
    }
}

in rust would it make a difference if i converted the stream of bytes to a string and iterate over it, or should i continue using this aproch, or is there a better way of doing it? i am open to all suggestions as long as they provide speed insights.

thank you

Converting back and forth between bytes and strings then using unsafe all over the place while simultaneously mutating state is both quite ugly and dangerous (insofar as it has the potential of getting the indexing wrong and introducing memory unsafety).

I'm not entirely sure what purpose the deny array serves, but if you just want to only retain words containing an a, you could simply write

let string = "you can search throughout the entire universe for someone who is more deserving of your love and affection than you are yourself, and that person is not to be found anywhere. You yourself, as much as anybody in the entire universe deserve your love and affection";

for word in string.split_whitespace() {
    if word.contains(['a', 'A'].as_ref()) {
        println!("{}", word);
    }
}

This is as fast as it gets, contains no unsafe, no complicated state mutation, and a lot shorter and more readable, because it says exactly what the intent of the code is, no more. Playground.

2 Likes

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.