[Solved] Advanced sort on vector of slice


#1

Hi everyone !

My issue here is that I want to sort a vector by path names AND by directory THEN files.

I have a vector that kind of resembles this: Vec<(Path, Stat)>
I want to sort it like described above.

I am currently trying to do it with a sort_by on my vector but I don’t quite understand how sorting and ordering works in Rust.


#2

Since I need to send all of this to a String output I came up with a solution:

// Sorting the dirs and files by name
v_dirs_and_files.sort_by(|a, b| a.0.cmp(&b.0));

let mut s_files = String::new();
let mut  s_dirs = String::new();
v_dirs_and_files.into_iter()
    .for_each(|s| {
        if s.1.is_file() {
            s_files += &format!("{};", substring_recommend(s.0.into_os_string().to_string_lossy().to_mut(), 2, None).unwrap());
        } else {
            s_dirs += &format!("{};", substring_recommend(s.0.into_os_string().to_string_lossy().to_mut(), 2, None).unwrap());
        }
    });

response = format!("{}{}", s_dirs, s_files);

I don’t like this because my vector isn’t in the order I want.
But this is the concept.


#3

Well, lets start by looking at what Vec<T>::sort_by() returns:
https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort_by

We see it must return anything from the enumeration Ordering:
https://doc.rust-lang.org/std/cmp/enum.Ordering.html

This makes sense: when you are sorting something, you want to check to see if it should be before, after or stay, which is what happens when you return Less, Greater, and Equal respectively.

What you are asking is kind of complex though. You want to sort paths by their directories, and finally files.

Because directories and files can be treated as the same thing, text between some slashes, there is no need to differentiate really.

Then you need to ask yourself: Is “Aaaa” greater than “Z”? What about “Az” and “Z”? What about lower and uppercase? Special characters?

You see this becomes complex quickly.

Now I’ll bet that there is either a crate or a function in the standard library that lets you compare strings lexicographically :slight_smile: (https://doc.rust-lang.org/std/string/struct.String.html hint it does!)

Then you can easily sort:

/Test/Z
/Test/A/Z
/Test/A/A

With:

type FilePath = String;

fn sort_file_paths(paths: Vec<FilePath>) -> Vec<FilePath> {
    paths.sort_by(|a, b| a.cmp(b))
}

And that should take care of everything. Not very advanced right?


#4

I just saw your code post. You can see your choice of data type Vec<(Path, File)> (I know you wrote Stat here but it may be confusing to readers) has really complicated your code!

An alternative if you want to keep this “information” (separation of Path and File) is when comparing, turn your tuple into a String that can be lexicographically compared, as I’ve shown above.

I’m also confused why you would want to print all directories first, then all files second.

I would sort my two lists separately. In your case, split the tuples into two lists.


#5

Then you can easily sort:

I don’t think I was clear enough :confused:

My Vector is like this:
Vector<(PathBuf, FileStat)>

This is not a choice, this is what is returned to me by Rust’s ssh2 crate.
One more thing is that Paths are always in a single directory (no need to bother about subdirs as this will be taken care of by the website the information is being sent to)

An alternative if you want to keep this “information” (separation of Path and File) is when comparing, turn your tuple into a String that can be lexicographically compared, as I’ve shown above.

I think I understand, let me try something out.


#6

Please link me to the crate and the function you are callling.

To me PathBuf is probably the whole path, including the file… and FileStat are the permissions on the file.


#7

http://alexcrichton.com/ssh2-rs/ssh2/struct.Sftp.html
Here I’m calling pub fn readdir(&self, dirname: &Path) -> Result<Vec<(PathBuf, FileStat)>, Error>

Since it is Sftp, the root directory is my home directory. It is perfect for me.

Another solution I found (one that actually orders the vector):

v_dirs.sort_by(|a, b| {
    let a_prefix = if a.1.is_file() {
        format!("{}:{:?}", "f", a.0)
    } else {
        format!("{}:{:?}", "d", a.0)
    };
    let b_prefix = if b.1.is_file() {
        format!("{}:{:?}", "f", b.0)
    } else {
        format!("{}:{:?}", "d", b.0)
    };
    a_prefix.cmp(&b_prefix)
});

The logic is that I add a prefix (d or f, where d is smaller than f) and it sorts them exactly as I want. This does the job but leaves a lot to be desired…

I am open to suggestions.


#8

Ohhh, I understand what you are trying to achieve now…

PathBuf can be a directory or file, based on its FileStat. So you just want to literally print directories sorted, then files sorted.

v_dirs.sort_by(|a, b| {
  if a.1.is_file() && b.1.is_dir() { Ordering::Equal }
  else if a.1.is_dir() && b.1.is_file() { Ordering::Greater }
  else { a.cmp(b) }
});

This should:

  1. Not change anything if a file is already before a directory (I may have confused the ordering here)
  2. Swap a directory and file if it comes before a file
  3. Otherwise we are comparing a file and a file, or a directory and a directory.

Note I have not tested this.


#9

So you want files directories sorted by name, then directories files sorted by name?

In other words, you have two levels of sorting:

  • Higher priority: file vs directory
  • Lower priority: name

The sort methods on slice are stable sorts, so you can achieve this by doing two separate sorts:

// Sort by least important field first
v_dirs.sort_by_key(|pair| pair.0.to_owned());
// Then when you sort by file-versus-directory-ness, all the files
// will keep their order from the first sort, as will all the directories
v_dirs.sort_by_key(|pair.1| stat.1.is_file()); // false < true so dirs < files

Although an even easier way to do it is to just use a tuple as your key, so that you can compare multiple properties at once. Tuples are compared lexicographically, so put the most important field (file vs directory) first.

v_dirs.sort_by_key(|pair| (pair.1.is_file(), pair.0.to_owned()));

Edit: got file vs dir order backwards, fixed
Edit 2: to_owned turns Path into an owned PathBuf, which is necessary to work around limitations related to lifetimes in sort_by_key's signature. You could avoid that using sort_by but it’s messier.


#10

Your 1st suggestion is what I was going to do, but then thought “lets try to keep it within the closure”.

But your 2nd suggestion is just wicked. Awesome :blush:


#11
v_dirs.sort_by_key(|pair| (pair.1.is_file(), pair.0.to_owned()));

It’s possible to use tuple destructuring to further simplify this.

v.dirs.sort_by_key(|(path, stat)| (stat.is_file(), path.to_owned());

#12

that is what I I originally wrote, but decided to change it back once I noticed the error and the corrected form began to look like this:

v.dirs.sort_by_key(|&(ref path, ref stat)| (stat.is_file(), path.to_owned());

#13

Don’t use sort_by_key and then call .to_owned() inside the closure. That will make two allocations and two memcopies on every comparison. Expect horrible performance. :slight_smile:

Instead, I am proposing two fast and reasonably easy solutions.

First, a simple solution that uses unstable sort to order by path, and then stable sort to order by is_file():

v_dirs.sort_unstable_by(|(a, b)| a.0.cmp(&b.0));
v_dirs.sort_by_key(|p| p.1.is_file());

Second, and the fastest solution – one unstable sort with a complex comparison function:

v_dirs.sort_unstable_by(|a, b| a.1.is_file().cmp(&b.1.is_file()).then_with(|| a.0.cmp(&b.0)));

#14

Thank you all for that amazing help !

I knew this could be done in one line and actually I was extremely close to what @stjepang gave in the second solution, what I was missing was .then_with().

I like knowing about possible optimizations too. But since I am not far into the developpement I try not to care too much for now.


#15

when I alluded to “you could avoid that using sort_by,” what I had in mind was this:

vec.sort_by(|a,b| {
    // note: this could be defined anywhere. Defining
    //  it inside the closure is just a style choice
    fn key(v: &Vec<i32>) -> (bool, Path) {
        (stat.is_file(), path)
    }
    key(a).cmp(key(b))
});

A tricky thing here is that key needs to be defined in such a way that v and the output Path have the same lifetime. The way I wrote it by using a fn (instead of a closure) makes this work because it is equivalent to

fn<'a> key(v: &'a Vec<i32>) -> (bool, Path<'a>) { ... }

A closure would work in cases where the compiler can infer that the function must satisfy for<'a> FnMut(&'a Vec<i32>) -> (bool, Path<'a>) but it’s not easy to explain what these cases precisely are (it’s something I’ve just had to gain an intuition for over time). If it’s not able to infer this then it will incorrectly assume that key(a) and key(b) must have the same lifetime, and it will fail to decide what that lifetime is.


…this is all rather ugly stuff that I didn’t want to dump on someone who probably doesn’t need to worry about it, hence to_owned :stuck_out_tongue:


#16

@ExpHP I already had issues with lifetimes and read through the entire book (at least twice) to begin to even understand. My second post here was about lifetimes and I learned a lot from this.
I understand what you code is doing, I’m sure it does exactly what I want but I knew there was a one-liner to do this.
Rust has really surpised me for this because although I don’t have access to a good and working pre-programmed substring (yes I know you can use the bytes to do this (which is what I do right now because of another issue I have with ssh2 for rust…)), some things are so easy to do compared to everything else that I know.

Just a quick question too:

How can I format a String from a vector like this:
value1;value2;value3

Everything is separated with “;” but there is no “;” at the end of the string:
In other languages I would do a for and avoid adding the “;” at the end with the index (I know where I am)

Here I do this:

v_dirs.into_iter().for_each(|s| response += &format!("{};", substr(&s.0.into_os_string().to_string_lossy(), 2, None).unwrap_or("")));

(The substr function simply removes 2 charactes that I do not want at the beggining of each path: “./”)
As you can see, this solution adds a “;” at the end of the string.
What I can do is passing it to my substring function after it is completely formed and remove the last character.

What I would like to know is if there is any way to know if I am at the end of the loop to avoid adding an unnessecary “;” ?


#17

You can use join from itertools. Alternatively, you can either collect your modified strings into a Vec and then call slice join on that. Or, you can use Iterator::enumerate and Iterator::fold to do it manually (the enumerate will give you the index of the item you’re looking at, which you can compare against the length of the vector).


#18

Note that, more generally, you can usually make use of explicit calls to iterator.next() to solve these kinds of “fencepost vs fence segment” problems that you might be used to solving by looking at indices.

fn sentence_case(sentence: &str) -> String {
    let mut words = sentence.split_whitespace().map(|s| s.to_lowercase());
    let mut out = String::new();

    // First word is Propercase and needs no space before it
    if let Some(word) = words.next() {
        out += &capitalize_first_letter(&word);
    } else { return out; }

    for word in words {
        out += " ";
        out += &word;
    }

    out
}

fn capitalize_first_letter(s: &str) -> String {
    unimplemented!()
}

This works well for special-casing the initial items, but doesn’t help with when you want to special-case the last item. Itertools has with_position() which looks helpful, though I’ve never used it myself.


#19

You can use join2 from itertools.

Well, this looks useful! I’ll look into that later but I think this is the solution. I’m not afraid to use many libraries in rust because I find it mostly works (I had bad experiences in C, C++ and Java)

This works well for special-casing the initial items, but doesn’t help with when you want to special-case the last item. Itertools has with_position() which looks helpful, though I’ve never used it myself.

I thought about this but since it’s the last item that really bothers me I didn’t try to do anything with it (the next() I mean)
Itertools seem to have some useful things.

Thank you all, I think you answered my questions related to this issue, time for another one regarding ssh2 this time. You’ll see me post in some time about it.