Can this sorting code be rewritten in a more elegant way?

Motivated by the question in [Solved] Best way to sort a string of key value pairs - #9 by vitalyd about the best way to sort key/value pairs, I thought about sorting with a known distribution of keys. If each key is unique and a bound is known then it can be faster to just insert each value in its position. This way the typical logarithmic factor of the sorting algorithm is avoided.

For that I wrote the following code:

fn main() {
    let input = "Key0=value,Key3=vvalue,Key2=vealue,Key10=vralue,Key4=vtalue,Key1=vyalue,Key17=vualue".to_owned();
    let mut kvs=vec![None;100];
    for item in input.split(",").map(str::trim)
    {
        let mut item_s=item.split("=");
        let key=item_s.next().unwrap();
        let value=item_s.next().unwrap();
        let keyid:usize=key["Key".len()..].parse().unwrap();
        kvs[keyid]=Some(value);
    }
    println!("{:?}", itertools::join(kvs.iter().enumerate().filter_map(|(keyid,opt)|match opt{
        None => None,
        Some(value) => Some(format!("Key{}={}",keyid,value))
    }),","));
}

playground link: Rust Playground

I feel this code considerably dirty and I thought that maybe someone can make it more elegant. More concretely: it is possible to rewrite the loop in a functional way?

I managed to "trim" your main loop to:

    let mut kvs = Vec::
      from_iter(
        (0 .. 100).map(|_| None)
      )
    ;
    input
      .split(",")
      .filter_map(|s| s.trim().parse().ok())
      .for_each(|Key(id, value)| kvs[id] = Some(value))
    ;

... but only because I extracted / factored out the individual parsing logic into:

struct Key(usize, Box<str>);

const KEY: &str = "Key";

impl str::FromStr for Key {
  type Err = ();

  fn from_str(s: &str) -> Result<Self, Self::Err>
  {
    let mut s = s.split("=");
    match (s.next(), s.next()) {
      (Some(key), Some(value)) if key.starts_with(KEY) => {
        Ok(Key(key[KEY.len() ..].parse().map_err(mem::drop)?, value.into()))
      },
      _ => Err(()),
    }
  }
}
1 Like

Aside: it is in retrospect obvious, but I've not come across using destructuring patterns in a closure arg like this before, and I like it.

1 Like

Oh, there are tons of places that you can sneak in a pattern match:

let (a, b) = get_pair();
for (a, b) in get_pairs() {}
let _ = get_pairs().iter().map(|(a, b)| {});
match get_pair {
    (MyEnum::MyType {
        foo,
        bar,
        (i, j)
    }, _) => {},
    (_, MyStruct {iter, ..}) => {},
    _ => {}
}
fn foo((a, b): ((), ())) {}

Yeah. I use tuple patterns all the time as closure args, but it's never occurred to me that this implies all the pattern syntax is available there. As I say, obvious in retrospect, but a nice realisation.

2 Likes

Thank you. That code is a lot nicer. I forgot about the existence of the for_each method. Extracting the pair from the str within an impl str::FromStr is also very nice touch.

1 Like

Matter of taste I suppose, but I would prefer to use a Regex for the string parsing.

    let rx = regex::Regex::new(r"Key(\d+)=(\w+)").unwrap();
    rx.captures_iter(input).for_each(|cap| {
        let id = cap.get(1).unwrap().as_str().parse::<usize>().unwrap();
        let val = cap.get(2).unwrap().as_str();
        // ...
    });

(Don't let the unwraps fool you, none of these steps can fail)

1 Like

Unless your regex was statically precompiled your solution will most probably not be as performant as manually parsing.

Since the post already tries to optimize the complexity of a sorting algorithm from O(n · log(n)) to O(n + M) (where M is the upper bound), performance seems to be important, here

Can also use fold and ArrayVec to trim things down a bit - not sure it's significantly better, however:

let kvs = input
        .split(",")
        .map(|kv| kv.trim().split("=").collect())
        .fold(vec![None; 100], |mut v, kv: ArrayVec<[_; 2]>| {
            let key = kv[0]["Key".len()..].parse::<usize>().unwrap();
            v[key] = Some(kv[1]);
            v
        });

playground

2 Likes

Interesting. What does the collect do if there aren't enough elements? panic! ?

I am really looking forward to a TryFromIter (and .try_collect) trait (and symmetric method) so that arrays can implement those in ::std

It sets the length of the ArrayVec accordingly (so if there’s no “=“, say, it’ll set the length of the ArrayVec to 1). Trying to index beyond that length will panic (in this example, inside the fold() closure).

There’s a try_fold that can be likely used to make this handle errors/edge cases, but this was just demo code :slight_smile:.

1 Like

Oh yes sure, it's not an array but a vec. I was thinking more of something akin to my ::stackvec::TryFromIterator (a crate that ended up being too similar to ::arrayvec despite its different initial purpose).