I'm trying to align UTF-8 strings, specially whenever one is using asian alphabets. The only trick I found was to use tabs, but I can't find a generic solution.
See this:
use unicode_width::UnicodeWidthStr;
fn align1(s1: &str, s2: &str) {
let w1 = s1.chars().count();
let w2 = s2.chars().count();
let w = usize::max(w1, w2);
println!("<{:<w$}>", s1);
println!("<{:<w$}>", s2);
}
fn align2(s1: &str, s2: &str) {
let w1 = UnicodeWidthStr::width(s1);
let w2 = UnicodeWidthStr::width(s2);
let w = usize::max(w1, w2);
println!("<{:<w$}>", s1);
println!("<{:<w$}>", s2);
}
fn align3(s1: &str, s2: &str) {
println!("<{}\t\t\t>", s1);
println!("<{}\t>", s2);
}
fn main() {
let s1 = "香港.ä¸ĺś‹";
let s2 = "xn--j6w193g.xn--fiqz9s.";
align1(s1, s2);
align2(s1, s2);
align3(s1, s2);
}
for which the last call gives a correct alignment (Rust Playground).
The problem you are having is in part because std::fmt’s concept of width is the naive chars().count() one, so it doesn't pad the correct amount. You need to arrange so that all widths are computed using the unicode-width definition, not the built-in one. Here is one way to do that:
fn align4(s1: &str, s2: &str) {
let w1 = UnicodeWidthStr::width(s1);
let w2 = UnicodeWidthStr::width(s2);
let w = usize::max(w1, w2);
let padding_1 = w - w1;
let padding_2 = w - w2;
let blank = "";
println!("<{s1}{blank:<padding_1$}>");
println!("<{s2}{blank:<padding_2$}>");
}
This way, we are only asking std::fmt to write a specific number of spaces, not to consult the width of s1.
However, this still won't work for everyone, because different fonts and text renderers have different notions of character width. The only truly reliable way to align text is to be able to ask the text renderer actually in use what it thinks the width is. (In terminals, you can do this by writing the string and then asking the terminal where the cursor ended up, but that is slow. But terminals and other “always monospace” renderers are also more likely to agree with unicode-width than general-purpose text renderers such as that in a web browser.)