Doing it the Rust way

Hi Rustaceans !

Still learning about Rust, I'm trying to write code, in a more idiomatic way. I had this code, that I wanted to get more 'functional': The idea was to remove from a base32 Vec all control chars, and in the same time, count the '=' char occurrence.

        let mut content_b32: Vec<u8> = Vec::new();
        file_b32.read_to_end(&mut content_b32).expect("Unable to read the file");

// ### Question related part:
        let mut content_b32_filtered: Vec<u8> = Vec::with_capacity(content_b32.len());
        
        // Remove control char, and count padding chars.
        let mut padding: usize = 0;
        for v in &content_b32 {
            if *v > 31 {
                content_b32_filtered.push(*v);
                if *v == b'=' {
                    padding += 1;
                }
            }
        }

After a lot of time (really !) I get this :

        let mut content_b32: Vec<u8> = Vec::new();
        file_b32.read_to_end(&mut content_b32).expect("Unable to read the file");

// ### Question related part:
        let mut padding: usize = 0;

        // Closure, captures padding, filtering chars < 32 and counting 'b' symbols.
        let filter_counter = |x: &u8| -> bool  {
            if *x == b'=' { padding += 1; }
            *x > 31
        };

        let content_b32_filtered: Vec<u8> = content_b32.iter().cloned().filter(filter_counter).collect::<Vec<u8>>();

I guess this code is more acceptable as Rust code, but I find the capture of the padding variable inelegant :nerd_face: . By discovering Iterators, Iterators adaptors, I realized that this subject was very, very wide and that there are probably better ways, or a more efficient way (avoiding duplication ?) to do the same thing. I would appreciate any suggestions.

Coming from C++/Java background, I'm not comfortable with this way of writing, and reading chained code. Do others devs, coming from an equivalent background have any advices to get used to such syntax ?

Regards

If performance isn't super critical, I'd probably iterate twice to get the two values you want (untested):

let content_b32_filtered: Vec<u8> = content_b32.into_iter()
                                               .filter(|x| *x > 31)
                                               .collect();
let padding = content_b32_filtered.iter()
                                  .filter(|x| *x == b'=')
                                  .count();

NB: This consumes content_b32 and relies on the fact that b'=' > 31, and is therefore preserved by the first filter.

Your solution seems OK.

Iterators are usually good at doing one thing only. If a task doesn't fit iterators well, then there's nothing wrong in using a for loop.

3 Likes

@2e71828
Because this is a learning project, the speed isn't important. Nevertheless I think that iterating twice is not a better solution, my simple point of view. Thanks for your answer !

Ok, I'll keep that in mind. Thanks.

You can also try a functional approach using fold (Rust Playground):

#[derive(Debug, Default)]
pub struct Analysis {
    filtered: Vec<u8>,
    padding: usize,
}

impl Analysis {
    pub fn new(filtered: Vec<u8>, padding: usize) -> Analysis {
        Analysis { filtered, padding }
    }
    pub fn with_capacity(capacity: usize) -> Analysis {
        Analysis {
            filtered: Vec::with_capacity(capacity),
            padding: 0usize,
        }
    }
    pub fn accumulate(mut self, item: u8) -> Analysis {
        if item % 2 == 0 {
            self.padding += 1
        }
        if item <= 0x0f {
            self.filtered.push(item)
        }
        self
    }
}

fn main() {
    let v: Vec<u8> = (0x00..=0xff).collect();
    let analysis = v
        .into_iter()
        .fold(Analysis::default(), Analysis::accumulate);
    println!("{:?}", analysis)
}

Output:

Analysis { filtered: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], padding: 128 }
1 Like

Many thanks for this Answer. I will study it since I have no idea of what .fold() is about, nor Default !

That's a nice opportunity to dig more into iterators. Great

Default is a trait for types that have a default value. For instance, numeric types have a default value of 0; the default value of Vec<T> is the empty vector. When defining a struct, if all the fields types implement Default, you can derive it for the struct. In our case,

Analysis::default()
  ==
{ filtered: Vec<u8>::default(), padding: usize::default() }
  ==
{ filtered: Vec<u8>::new(), padding: 0usize }

As for fold,

iterator.fold(init, f)

is equivalent to

let mut accumulator = init;
for item in iterator {
    accumulator = f(accumulator, item);
}
accumulator;

This pattern of iteration is super common, so functional programming languages have a dedicated function for it, with some variants too.

See this comparison of the functional/imperative approach on Rust Playground.

1 Like

Thanks for these crystal clear explanations. Default is rather straightforward, but the fold pattern is new to me. It's a way of thinking that I'm not used to. Your equivalent code was easier for me to understand than book textual explanation. Sometime, code is better that words :smile: . . I realize that I also have to learn more about theory of functional language. The playground link was also useful and explicit.

1 Like

In functional languages one often doesn't have loops at disposal and is forced to use recursion. The definition of foldl in Haskell

foldl f z []     = z                  
foldl f z (x:xs) = foldl f (f z x) xs

can be translated in Rust to (one would actually use the IntoIterator trait to make it even more versatile, but I'm simplifying a bit)

fn fold<I: Iterator, B, F>(mut iter: I, init: B, mut f: F) -> B
where
    F: FnMut(B, I::Item) -> B,
{
    if let Some(item) = iter.next() {
        fold(iter, f(init, item), f)
    } else {
        init
    }
}

The standard library std::iter::Iterator::fold uses a while let loop instead of recursion to avoid stack overflow, but is otherwise equivalent. You can check it here.

Edit: the recursive version I wrote here is tail recursive, but I'm not sure Rust can optimize it to avoid blowing the stack.

1 Like

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.