Unicode-aware string sorting

Hi, I am looking to present a sorted list of strings to a user which I currently have stored in a Vec. I would like the sort to take into account Unicode characters (so e.g. Ä would get sorted between A and B in German, instead of as a separate character after Z). I am OK with having to explicitly select a locale for which the sort should work. Ideally I'd also like case folding, but I am OK without it.

I can't seem to find a crate that does this, which feels extremely surprising to me, but maybe I have just been looking under the wrong keywords.

I would be grateful for any suggestions.

Unicode aware is the wrong term here, you mean locale aware.
As a native german I'm not sure where I would put the Ä, if I would distinguish it from the A or not. E.g. Ärzte comes before Ast or not.

2 Likes

The technical term for this is collation and it does indeed appear to be the case that there aren't really any facilities to do this in Rust at the moment. The UNIC project appears to be a significant effort to implement most of unicode, but collation is still on the todo list. You might be able to use Servo's icu-sys crate but that just provides raw bindings to libicu.

5 Likes

Thank you for your help. I think I will take a closer look at icu-sys for now.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.