My Solution to Topic Chapter 8 Median Mean

Hello crustaceans :crab:, this is my 2nd day learning rust and I pretty much finally reached chapter 8. So I was looking for solutions to check if my code was right but didn't find any so here ya go. I'm actually open for criticism and ideas. I know my code isn't as fancy or anything like that. I came from TS btw.

use std::collections::HashMap;

fn main() {
    // 1.
    {
        // List of integers : Vectors
        let list_1 = vec![7, 23, 12, 7, 56, 89, 12, 45, 23, 7, 90, 56, 12, 89, 23];
        let list_2 = vec![2, 53, 34, 64, 23, 0];

        // Median
        println!("medium: {:?}", median(&list_1));
        println!("medium: {:?}", median(&list_2));

        // Mode
        println!("mode: {:?}", mode(&list_1));
        println!("mode: {:?}", mode(&list_2));
    }
}

fn median(list: &Vec<i32>) -> Option<i32> {
    // Sort
    let mut list = list.clone();
    let len = list.len();
    list.sort_unstable();
    println!("{:?}", list);
    if len % 2 == 0 {
        let (a, b) = {
            let mid = len / 2;
            (mid, mid - 1) // {-, -, -, -} <== get indexes [1, 2]
        };

        let num1 = list.get(a);
        let num2 = list.get(b);

        match (num1, num2) {
            (Some(num1), Some(num2)) => Some((num1 + num2) / 2),
            _ => None,
        }
    } else {
        let num = (len / 2) + 1;
        let mid = list.get(num);

        if let Some(a) = mid {
            Some(*a)
        } else {
            None
        }
    }
}

fn mode(list: &Vec<i32>) -> Vec<i32> {
    let mut map = HashMap::new();

    for &num in list {
        let count = map.entry(num).or_insert(0);
        *count += 1;
    }
    // Find the maximum value in the HashMap
    let max_value = map.values().cloned().max().unwrap_or(0);

    // Collect all keys with the maximum value
    let mut modes = Vec::new();
    for (&key, &value) in &map {
        if value == max_value {
            modes.push(key);
        }
    }

    modes
}

//* Pointers:
//1. Given a list of integers, use a vector and return the median (when sorted, the value in the middle position) and;
//   mode (the value that occurs most often; a hash map will be helpful here) of the list.

//2. Convert strings to pig latin. The first consonant of each word is moved to the end of the word and “ay” is added, so “first” becomes “irst-fay.” Words that start with a vowel have “hay” added to the end instead (“apple” becomes “apple-hay”). Keep in mind the details about UTF-8 encoding!

//3. Using a hash map and vectors, create a text interface to allow a user to add employee names to a department in a company. For example, “Add Sally to Engineering” or “Add Amir to Sales.” Then let the user retrieve a list of all people in a department or all people in the company by department, sorted alphabetically.
1 Like

Use clippy :wink:

    Checking playground v0.0.1 (/playground)
warning: writing `&Vec` instead of `&[_]` involves a new object where a slice will do
  --> src/main.rs:20:17
   |
20 | fn median(list: &Vec<i32>) -> Option<i32> {
   |                 ^^^^^^^^^
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg
   = note: `#[warn(clippy::ptr_arg)]` on by default
help: change this to
   |
20 ~ fn median(list: &[i32]) -> Option<i32> {
21 |     // Sort
22 ~     let mut list = list.to_owned();
   |

warning: manual implementation of `Option::map`
  --> src/main.rs:43:9
   |
43 | /         if let Some(a) = mid {
44 | |             Some(*a)
45 | |         } else {
46 | |             None
47 | |         }
   | |_________^ help: try: `mid.map(|a| *a)`
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#manual_map
   = note: `#[warn(clippy::manual_map)]` on by default
3 Likes

For more efficient median calculation, making use of a dedicated O(n) algorithm like <[T]>::select_nth_unstable could be useful, instead of full sorting.


But let’s stay with your implementation and look at it more closely: Your usage of list.get(…) in median is weird. It is written as if the get calls could ever fail, and perhaps they can, let’s check the edge cases…

vec![] - nope, that straight out panics (in debug mode) from integer underflow
vec![1] - that returns None
vec![2, 4] - returns Some(3), that’s good
vec![2, 4, 6] - returns Some(6), that’s incorrect

well… these already highlight some debugging left to do… once you handle the cases that can faul (only empty vec should return None), then indexing can be done in a way that would panic if the indexes are wrong, so you can catch your program bugs more easily. In case of indexing, this means you can just use list[…] instead of list.get(…) then; also spares a few match statements (and it would differently resolve the part where clippy already complained about a match above).


Regarding mode, same thing clippy points out for median, use a &[i32] parameter instead of &Vec<i32>. Then, the code is quite alright; I’d use copied instead of cloned in

-   let max_value = map.values().cloned().max().unwrap_or(0);
+   let max_value = map.values().copied().max().unwrap_or(0);

because that makes it more clear that no potentially expensive cloning is happening.

Then, to “collect all keys”, you could use the collect method. For example with filter and map

    map.iter()
        .filter(|(_, &v)| v == max_value)
        .map(|(&k, _)| k)
        .collect()

or with filter_map and then

    map.iter()
        .filter_map(|(&k, &v)| (v == max_value).then(|| k))
        .collect()
3 Likes

woah woah, where is that? it didn't show up on my vscode

noted! I'll try fixing it up. I kinda rushed it last night. thanks btw. and I'll try clippy!!

On the command line, that's cargo clippy. But you can also get it into vscode with rust-analyzer.

See here for how you might configure that: How to use Clippy in VS Code with rust-analyzer? - #2 by 17cupsofcoffee

1 Like

Here I fixed the errors and did a bit more research about unstable functions in vectors so I can use them and I think its good. I also just realized that I over complicated things. (specially in the get part).
Did a bit of refactoring.

use std::collections::HashMap;
 
fn main() {
    let list_1 = vec![2, 2, 1, 4, 2];

    let median = median(&list_1).unwrap_or(0);
    println!("Median: {}", median);

    println!("mode: {:?}", mode(&list_1));
}

fn median(list: &[u32]) -> Option<u32> {
    let mut list = list.to_owned();
    let len = list.len();

    match len {
        0 => None,
        1 => Some(list[0]),
        _ => get_median_value(&mut list, len),
    }
}

fn get_median_value(list: &mut [u32], len: usize) -> Option<u32> {
    if len % 2 == 0 {
        let len = len / 2;
        list.sort_unstable();

        // Slice in 2 and take first and last num
        let x = &list[..len][len - 1];
        let y = &list[len..][0];
        Some((x + y) / 2)
    } else {
        let (_, mid, _) = list.select_nth_unstable(len / 2);
        Some(*mid)
    }
}

fn mode(list: &[u32]) -> Vec<u32> {
    let mut map = HashMap::new();

    // Count total numbers of each
    for &num in list {
        let count = map.entry(num).or_insert(0);
        *count += 1;
    }

    // Find the maximum value in the HashMap
    let max_value = map.values().copied().max().unwrap_or(0);

    // Collect all keys with the maximum value
    map.iter()
        // filter out values that matches max value
        .filter(|(_, &value)| value == max_value)
        // take the key
        .map(|(&key, _)| key)
        .collect()
}

1 Like

I agree with @steffahn on just how awesome clippy is! I find myself running it as often as I build.


Rust's match statement is pretty powerful. Instead of matching the length of the list, you can actually match the list itself (technically it's called a 'slice' though :slight_smile:). So for median you could instead do:

match list {
    [] => None,                        // empty list
    [val] => Some(val),   // list only has 1 element
    _ => Some(get_median_value(list)),  // otherwise
}

This also ties into the fact that get_median_value doesn't need to return an Option.
Both of the branches always return Some, so might as well just drop the Optionness! :slight_smile:
And this makes logical sense. That function only gets called when the list isn't empty.
And a non-empty list will always have a median.

Personally, I'd be tempted to add assert!(!list.is_empty); at the top of get_median_value.
It doesn't change the behavior - if an empty list did get passed, both branches would panic anyways.
But I find asserts really helpful when reading the function (like a kind of documentation)

Also, you can pass custom messages, which are much nicer when debugging:
assert!(!list.is_empty(), "empty list was passed to 'get_media_value'!");
compared to what select_nth_unstable would panic with:
partition_at_index index 0 greater than length of slice 0
Although Rust has awesome error messages in general... some are still more cryptic than others!


For the mode function,
Rust has 3 'flavors' of iterators, corresponding to the 3 ways types can be passed around.

  • iter operates on an immutable borrow of a thing, and gives you immutable borrows (&u32 here).
  • iter_mut operates on a mutable borrow, and gives you mutable borrows (&mut u32 here).
  • into_iter operates on owned values, and gives you the values directly (u32 here).

(from the official docs: std::iter - Rust)

iter is always available. iter_mut can only be used if you have mutable access.
into_iter can only be used if you own the data, and once you've called into_iter, that data is consumed (unless you collect it into a new container or something).

Here I think into_iter is more appropriate than iter.
We own the HashMap and it's about to get thrown away anyways (when the function ends), so there's no problem consuming it. And because it works on values, you could drop the & in the closures:

-        .filter(|(_, &value)| value == max_value)
+        .filter(|(_, value)| value == max_value)

And same for the one in map. Right now you need these because iter gives you borrows, so you need to get to the actual value. Whereas into_iter will just give you the values with no fan-fare.

Also, I generally prefer into_iter (where it makes sense and is possible), since my gut feels like there's more room for optimization with it than the others. Since the values are being moved, and the container can be ripped apart, since it's being consumed anyways... buut take it with a grain of salt!! : v)

1 Like

A few things for your median.

  1. I would take &mut [] as the argument of median and let the user of the function clone the vector or slice if he wants to keep the old state and document that in the function doc comment. Of course you code argue, that that's too much to decide for the user and leave it &[] and clone it inside the function like you did. I'd always lean towards not allocating inside the function but let the user of the function make an allocation.
  2. If you decide, that select_nth_unstable is better than sort_unstable then I would use it for both the even and the odd case
use std::collections::HashMap;
 
fn main() {
    let list_1 = vec![2, 2, 1, 4, 2];

    let median = median(&mut list_1.clone()).unwrap_or(0);
    println!("Median: {}", median);

    println!("mode: {:?}", mode(&list_1));
}

/// Median has to partialy sort the slice. Clone it before passing it to this function if you want to avoid changing your data.
fn median(list: &mut [u32]) -> Option<u32> {
    let len = list.len();

    match len {
        0 => None,
        1 => Some(list[0]),
        _ => Some(get_median_value(list, len)),
    }
}

fn get_median_value(list: &mut [u32], len: usize) -> u32 {
    if len % 2 == 0 {
        let (_, x, rest) = list.select_nth_unstable(len/2 - 1);
        let (_, y, _) = rest.select_nth_unstable(0);
        (*x + *y) / 2
    } else {
        let (_, mid, _) = list.select_nth_unstable(len / 2);
        *mid
    }
}

Here are few more things I would change. See code.
Also, I added test cases so you can see how easy that is to do in rust. Doing this is realized, that you probably want to return a float for the median, otherwise you have integer division when the list has an even length. Or maybe you want that, but then I would also document that. I don't know how median is supposed to behave canonically.

/// median() has to partialy sort the slice. Clone it before passing it to this function if you want to avoid changing your data.
pub fn median(list: &mut [u32]) -> Option<f32> {
    match list {
        [] => None,
        list => Some(median_nonempty(list)),
    }
}

/// median_nonempty() has to partialy sort the slice. Clone it before passing it to this function if you want to avoid changing your data.
/// Panics if list is empty.
pub fn median_nonempty(list: &mut [u32]) -> f32 {
    let len = list.len();
    assert!(len > 0, "List was empty");
    if len % 2 == 0 {
        let (_, &mut x, rest) = list.select_nth_unstable(len / 2 - 1);
        let (_, &mut y, _) = rest.select_nth_unstable(0);
        (x as f32 + y as f32) / 2.0
    } else {
        let (_, &mut mid, _) = list.select_nth_unstable(len / 2);
        mid as f32
    }
}

#[test]
fn test_median() {
    assert_eq!(median(&mut []), None);
    assert_eq!(median(&mut [1]), Some(1.));
    assert_eq!(median(&mut [2, 1]), Some(1.5));
    assert_eq!(median(&mut [3, 1, 2]), Some(2.));
}

#[test]
#[should_panic(expected = "List was empty")]
fn test_median_nonempty_panics() {
    median_nonempty(&mut []);
}

I wouldn't use f32 here, rather f64. That would at least be able to represent all the results accurately. With f32, towards the largest u32 values, this type gets as inaccurate as only representing every 256th whole integer, so you actually make it a lot less precise than just rounding integers, in fact it's only sufficiently precise to represent the half integers below 223.

2 Likes

hey, thanks for your suggestions, It actually made my code look better specially in the get_median_value function. However I have a little problem in the filter part in mode, It throws an error that it should be &value

right here:

  •    .filter(|(_, &value)| value == max_value)
    
  •    .filter(|(_, value)| value == max_value)
    

i changed it to into_iter, but it still throws an error. how should I apply it?

Ahh, filter is a little special : v) I should of double-checked my suggestion.
Unlike most of the other iterator functions, filter doesn't operate directly on the types.
It always takes a reference to the elements, instead of the elements themselves.

If you hop over to the docs, and look just a bit under the examples section, you can see it even has a little note about how you sometimes need to juggle the references with it.

This is necessary because there's no way to preserve the values.
With map, you get a value, and you can return a value.
But filter only lets you return a bool. So if it did work directly with values, it would consume them, and they'd be lost forever (since you can't return them). So, instead it always works with references.


Anyways, rambling aside TL;DR either of these should work:

        .filter(|&(_, value)| value == max_value)

or

        .filter(|(_, value)| *value == max_value)

With the first one, what changed is that value isn't a reference anymore. It's a value, because you're using into_iter. But the entire type is a reference, because you're in a filter function. A subtle but existing difference : v)

Personally I prefer the 2nd, but this is purely a matter of preference.
The 2nd one doesn't care about where the & goes in the type, it lets the compiler figure it out, and just says "I know value is behind some kind of reference, so use * to de-reference to the actual value"