# How to sort a vector of strings by substring within a struct

Hi,

I feel this is a bit too complicated for me so i would need some help here. What I have is a vec of structs such that one element is a string and i would like to sort my vec of structs so that the pattern within a string is utilized in sorting: Example:

``````use regex::Regex;

#[derive(Debug)]
struct A {
x: String,
y: usize
}

fn main() {

let vec : Vec<A>  = vec![
A{x:"a2:33-7".to_string(), y:5},
A{x:"a1:23-75".to_string(), y:7},
A{x:"a10:233-76".to_string(), y:8},
A{x:"a1:3-77".to_string(), y:9},
A{x:"b2:3-78".to_string(), y:0},
A{x:"c2:263-79".to_string(), y:3},
A{x:"a34:264-79".to_string(), y:4}
];

println!("{:?}", vec);

// i need to sort A's by x s.t. the
// structs are first sorted by substring
// up to the first ':' and then int that follows
// up to '-'

// expected result:
//vec![
//  A{x:"a1:3-77".to_string(),  y:9},
//  A{x:"a1:23-75".to_string(), y:7},
//  A{x:"a2:33-7".to_string(), y:5},
//  A{x:"a10:233-76".to_string(), y:8},
//  A{x:"a34:264-79".to_string(), y:4},
//  A{x:"b2:3-78".to_string(), y:0},
//  A{x:"c2:263-79".to_string(), y:3},
//]

}

``````

I understand that i should utilize `sort_by()` but i am getting lost in parsing the String patterns when it comes to double sorting.

Thank you for all the advice you can provide!

Your description sounds as if `a10:233-76` should come before `a2:33-7`, as the `a10` and `a2` part would, lexicographically, be in that order.

But your â€śexpected resultâ€ť has them in the opposite order. If thereâ€™s more structure to the part before the `:`, you need to explain the general case. Currently, Iâ€™m seeing always one letter followed by an integer, but who knows what youâ€™re after.

You also didnâ€™t express the desired behavior if everything up until the `-` is identical; of if the whole string is identical and only `y` differs. Or is that expected to never happen? Do you want a fallback or a panic?

4 Likes

ah sorry , my clumsy wording... there will always be a fixed letter prefix (3 letters from a-z) and int is actually `int+` so not only one but one or more digits and i would like to do a numerical sort on those as well as the number following `:`.

as for the `y` scenario.. i have not preference. if all is teh same then i do not care about what happens to structs with different y's

Could you also give a representative set of examples for better illustration of what kind of data can come before the `:`? Does `int+` mean multiple integers or just multiple digits?[1] Are leading zeroes possible?

1. sounds like youâ€™re saying multiple digits â†©ď¸Ž

3 Likes

man i failed in explaining this properly. Sorry for that!

so as for a more representative case:

``````options for x:

aaa1:33-7
aaa2:454-8
aag21:4565-8544
abg121:75-8
...

so :
[a-z]{3}([0-9]+):([0-9]+)-([0-9]+)
``````

as for the leading zeros , they are non existent.

EDIT:

As for the size limit they are all u32 and will never be larger

Any size limits on the integers? Will they fit `u32` or `u64` or something?

Assuming it would fit a `u32`, hereâ€™s a quick implementation I could come up with

``````use core::cmp::Ordering;

#[derive(Debug, PartialEq)]
struct A {
x: String,
y: usize,
}

fn get_int(s: &mut &str) -> u32 {
let u = s.find(|c| c < '0' || c > '9').unwrap_or(s.len());
let (l, r) = s.split_at(u);
*s = r;
l.parse().unwrap()
}
fn consume(s: &mut &str, n: usize) {
*s = &s[n..];
}

fn compare(ref mut l: &str, ref mut r: &str) -> Ordering {
l[0..3].cmp(&r[0..3]).then_with(|| {
consume(l, 3);
consume(r, 3);
get_int(l).cmp(&get_int(r)).then_with(|| {
assert_eq!(l[0..1], *":");
assert_eq!(r[0..1], *":");
consume(l, 1);
consume(r, 1);
get_int(l).cmp(&get_int(r)).then_with(|| {
assert_eq!(l[0..1], *"-");
assert_eq!(r[0..1], *"-");
consume(l, 1);
consume(r, 1);
get_int(l).cmp(&get_int(r)).then_with(|| {
assert_eq!(l[..], *"");
assert_eq!(r[..], *"");
Ordering::Equal
})
})
})
})
}

fn main() {
let mut vec: Vec<A> = vec![
A{x:"aaa2:33-7".to_string(), y:5},
A{x:"aaa1:23-75".to_string(), y:7},
A{x:"aaa10:233-76".to_string(), y:8},
A{x:"aaa1:3-77".to_string(), y:9},
A{x:"aab2:3-78".to_string(), y:0},
A{x:"aac2:263-79".to_string(), y:3},
A{x:"aaa34:264-79".to_string(), y:4}
];

println!("{:#?}", vec);

vec.sort_by(|l, r| compare(&l.x, &r.x));

println!("{:#?}", vec);

// expected result:
let expected = vec![
A{x:"aaa1:3-77".to_string(),  y:9},
A{x:"aaa1:23-75".to_string(), y:7},
A{x:"aaa2:33-7".to_string(), y:5},
A{x:"aaa10:233-76".to_string(), y:8},
A{x:"aaa34:264-79".to_string(), y:4},
A{x:"aab2:3-78".to_string(), y:0},
A{x:"aac2:263-79".to_string(), y:3},
];
assert_eq!(vec, expected);
}
``````

Rust Playground

One could probably also use a regex, maybe even more efficiently. Also Iâ€™m not verifying the first 3 characters are `a-z`. Also I now noticed that it doesnâ€™t verify the validity of later parts at all when the earlier parts are already comparing non-equalâ€¦ I suppose my verifying story is a bit inconsistent â€“ my reasoning while writing the code was that itâ€™d help me more quickly find any bugs I might have introduced â€“ you could simply remove all the `assert`s, anyways (and then the whole final `then_with(|| Ordering::Equal)` can go, too.)

Iâ€™ll be offline for a while now, but others can certainly answer and follow-up questions (e.g. the `ref mut l: &str` argument patterns are a bit funky ), or suggest improvements / alternatives

1 Like

I suggest parsing your `String`s into a `struct` with fields that represent the data you care about within the `String`, so that you can take advantage of strong typing, derived `Ord` implementations, and so on.

Here's a sketch.

Notes:

• I chose `u16` for the numbers arbitrarily and assumed their built-in `FromStr` implementations were appropriate
• Slicing a `str` usually takes more care than this code needs; `s[1..]` is relying on the `('a'..='z').contains` check to only allow letters whos length in UTF8 is 1
• Errors can be improved with enough elbow grease
• I didn't even implement `Display` or `Error` for `XParseErr` for example
• Unwrapping in the sort is not a good approach and is just for illustration
• You could use the parsed version everywhere after parsing
• You could pre-check that everything is well formed before sorting
• You could have a newtype holding a `String` you know is well formed and have a `From<WellFormedA> for ParsedA` implementation or such
• Etc
• You could implement `Display` for `ParsedX` and `From<ParsedA>` for `A`, etc
3 Likes

Though i marked the quinedot as a solution I find yours as helpful as his/hers but i do not know hoe to mark both thank you both for your help

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.