Have a look at the unicode-segmentation crate. It contains an extension trait called UnicodeSegmentation which can be used to iterate over the graphemes in a string. With that in hand you can just .take(N).
the_string.char_indices().nth(n) should give you the appropriate index (if you want N chars – I'm not sure if that's what you are after, or you want grapheme clusters, or all of that plus Unicode width, etc.)
Yes, I know you can take a string apart with UnicodeSegmentation and put it back together. But that's a lot of work to just keep log lines from overflowing.
I've written grapheme-oriented word wrap, so I know how, but that's overkill when all I want is truncation. Especially since most of the time, the string will be short enough it doesn't need to be truncated.
I'm just amazed that this isn't a standard library function.
There are actually good reasons that a bunch of otherwise desirable functionality wasn't included by default: it's so that the APIs can mature, evolve separately from stdlib, and if such a lib ever needs replacing, it can be done without gathering more and more deprecated stuff in stdlib over time.
Suppose the string I want to cut is “Ｈｅｌｌｏ， Ｗｏｒｌｄ！”. And I want to cut 8 characters. Is it “Ｈｅｌｌｏ， ” or “Ｈｅｌｌ”? What if original would have only included one space?
I would strongly suspect that if you want log lines from overflowing then you want “Ｈｅｌｌ” but then you not only need to write console-aware code, you need the full-blown terminal library to know what characters your terminal treats as single-width ones and which ones it treats as double-width ones!
It operates on byte indices. So if you give it a byte index that isn't on a valid char boundary, then it panics. If you know your log lines are all ASCII, then assume that every byte is a character and use String::truncate however you like.
If you want to treat each codepoint as a letter, then write one line of code to compute where you want to truncate:
let upto = s.char_indices().map(|(i, _)| i).nth(10).unwrap_or(s.len());
If you want to get it as correct as possible, then use graphemes via the unicode-segmentation crate. You can use the same code above for chars, but with grapheme_indices.
No need to take anything apart and putting it back together. Just find the index of the "glyphs" you want to show, find its byte offset and then do your truncation.
If you want to truncate a string to a maximum length approximately, you can use the unicode-width library to figure out how wide candidate pieces are.
If you want to truncate a string to a maximum length exactly, then you need feedback from your text renderer, because widths of arbitrary strings are affected by both the renderer per se and the font chosen. (It's actually possible to do this with terminals — you can write a string and then ask the terminal where the cursor ended up, and memoize the results so you get it right every time after the first. I've implemented this strategy, though not as a separable crate.)