Faster alternative to Vec's retain method

Hello there,
My profiler (callgrind) has shown that the following function is the most time-consuming part of my program. I reduced its uses as much as I could, but the performance remains unsatisfying to me. Is there a way to optimize this some more? I’ll point out that the size of arr is never greater than 9 and the only possible values are 1, 2, 3 … 8 and 9, w/o repetitions. Code below.

fn missing_chars(arr: Vec<char>) -> Vec<char> {
    let mut res: Vec<char> = vec!['1', '2', '3', '4', '5', '6', '7', '8', '9'];
    res.retain(|el| !arr.contains(el));
    res
}

There are many ways to create a faster function as long as you modify the code that calls it to allow a different and more efficient solution. This is an untested and un-benchmarked solution that’s probably faster:

#![feature(in_band_lifetimes)]

fn missing_chars1(arr: Vec<char>) -> Vec<char> {
    let mut res: Vec<char> = vec!['1', '2', '3', '4', '5', '6', '7', '8', '9'];
    res.retain(|el| !arr.contains(el));
    res
}

fn missing_chars2(arr: &[u8], buffer: &'a mut [u8; 9]) -> &'a [u8] {
    let mut digits = [false; 9];
    for &d in arr {
        debug_assert!(d > b'0' && d <= b'9' && !digits[usize::from(d - b'1')]);
        digits[usize::from(d - b'1')] = true;
    }
    let mut len = 0;
    for i in 0u8 .. 9 {
        if !digits[usize::from(i)] {
            buffer[len] = i + b'1';
            len += 1;
        }
    }
    &buffer[.. len]
}

fn main() {
    println!("{:?}", missing_chars1(vec!['3', '6']));
    let mut buffer = Default::default();
    println!("{:?}", missing_chars2(&[b'3', b'6'], &mut buffer));
}

If you perform further changes to the surrounding code you can probably invent an even faster function (like one that returns a single u32 that represents the complement set of the input digits).