What is the fastest way to replace all occurrences of a character in the `Vec<u8>`


#1

I have tried my best for do this,but is apparently neither concise or effectively way。

Is there have some built-in functions to replace char or string in the Vec<u8> without have to convert the buffer to String first?

use std::ptr;
use std::str;

fn main() {
    let find = 'a' as u8;
    let mut replace_str = "<-----replace----->".as_bytes().to_vec();

    replace(&mut read_file(), find, &mut replace_str);
}
fn read_file() -> Vec<u8> {
    "a1aa22aaa333a".as_bytes().to_vec()

}
fn replace(v: &mut Vec<u8>, find: u8, replace: &mut Vec<u8>) {
    let mut count = 0;
    let mut vec = Vec::new();

    let pointer = v.as_ptr();
    let mut mark_where_should_start = 0;
    let mut never_found = true;


    loop {

        if v[count] == find {
            never_found = false;
            if count != 0 && count - mark_where_should_start > 0 {
                let mut temporary = Vec::with_capacity(count - mark_where_should_start);
                unsafe {

                    temporary.set_len(count - mark_where_should_start);
                    ptr::copy_nonoverlapping(pointer.offset(mark_where_should_start as isize),
                                             temporary.as_mut_ptr(),
                                             temporary.len());
                }

                vec.append(&mut temporary);
            }
            mark_where_should_start = count + 1;

            vec.append(&mut replace.clone());
        }
        count += 1;
        if count + 1 > v.len() {
            break;
        }
    }
    if never_found {
        println!("Nothing has been found ");
    } else {
        println!("{:?}", str::from_utf8(&vec).unwrap());
    }
}



#2

Well, first of all, there are no string manipulation functions that work on things that aren’t Strings. That would somewhat defeat the purpose of having string types at all.

You can always just convert between String and Vec<u8> with nothing more expensive than a UTF-8 validity check.

Assuming there’s some other reason to be sticking to [u8], the below should likely suffice. Unless you get lucky with the optimiser, this is unlikely to be optimal, but it should be a good enough starting point. If nothing else, it’s much easier to read, and doesn’t use unsafe (which you shouldn’t be touching with a 50ft barge pole unless you really know what you’re doing).

fn main() {
    let find = b'a';
    let replace_str = b"xXx";

    let result = replace(&read_file(), find, replace_str);

    println!("result: {:?}", result);
}

fn read_file() -> Vec<u8> {
    b"a1aa22aaa333a"[..].to_owned()
}

fn replace(search: &[u8], find: u8, replace: &[u8]) -> Vec<u8> {
    let mut result = vec![];

    for &b in search {
        if b == find {
            result.extend(replace);
        } else {
            result.push(b);
        }
    }

    result
}

#3

:slightly_smiling: Thanks for your quick response; it’s really neat to see.


#4

I tried to test the performance of both:

My Version:

running 1 test
test bench_replace ... bench:         826 ns/iter (+/- 20)

Your Version:

running 1 test
test bench_replace ... bench:         336 ns/iter (+/- 124)

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured

Your version runs more faster than。

:sweat_smile: why touch the dangerous unsafe code is only because i cant found the way split the Vec neatly。 so copy the unsafe code from the library source code。

thanks again。