Do you want 7 chars or 7 bytes? If the former, then it is almost trivial using iterators: s.chars().take(7).collect(). But that doesn't account for what Go calls "runes" (when a single representable symbol spans multiple unicode codepoints). If you want 7 bytes, it gets more involved. The example using fold is not ideal because it lacks an early return for when you pass really long strings, but you can change that to a for loop instead
if you want to set the max limit of memory, then the byte length is the correct measurement, and truncate() already does that. if you are not sure 7 is at code point boundary, use floor_char_boundary()/ceil_char_boundary() to round it down or up first.
if you want the first 7 codepoints (the char type in rust), use the chars() iterator [1]:
let result = inp.chars().take(7).collect::<String>();
if you want 7 human perceived "characters" as in natural languages [2], this is a really hard problem (just like all unicode problems), and such features are not available in the standard library, you'll have to use third party crate, such as unicode-width, unicode-segmentation, etc. read their documentation for details.
I want to add: unicode grapheme clusters in theory can have unbounded number of codepoints, so even if you are processing unicode, you still need to set a upper limit of the byte length.
although codepoint count isn't a very useful metric if you are dealing with unicode ↩︎
Yes: unicode-width is what rustc uses for this and it works "well enough". Some terminals have issues by emoji being presented as 1.5 columns width. Most terminals have no support for "compound emoji" (like yours does), grapheme clusters that are meant to be shown as a single emoji like the ZWJ family above, so rustc simply removes all ZWJ from the output so that underlines are more likely to properly align with their intended text (cue "rustc separates families" sub-thread).
Given the updates, you can write something like the following, but could optimize it further to avoid a few allocations:
fn visual(s: &str) -> String {
let mut x = String::with_capacity(7); // This might be wider, but we're limiting the number of reallocations.
let mut w = 0;
for c in s.chars() {
let c_w = unicode_width::UnicodeWidthChar::width(c).unwrap_or(1);
if w + c_w > 7 {
break;
}
w += c_w;
x.push(c);
}
if w < 7 {
for _ in w..7 {
x.insert(0, ' ');
}
}
x
}
Terminal width for non-ASCII text is even more cursed than all the complications of what is a "character" in Unicode, because terminal implementations have their own opinions on which code points are "wide" and aren't, and this even varies by fonts installed.
There are crates that contain some tables/heuristics for simple cases, and there are crates that perform ANSI hacking black magic to measure actual rendered width of text in the terminal.