I've made a function to reliably convert utf-8 grapheme clusters stored in a 4-byte variable into a String.
I'd like to know if there could be some alterantive way of making this more efficient by avoiding some steps, ideally making this zero copy by avoiding the heap, by maybe using some kind of stack string from an external crate.
Grapheme clusters might not fit into a 32 bit integer. (And code points that aren't a Unicode Scalar Value can't be converted into a Rust str.) See also Rust's primitive char type:
The char type represents a single character. More specifically, since ‘character’ isn’t a well-defined concept in Unicode, char is a ‘Unicode scalar value’.
It’s important to remember that char represents a Unicode Scalar Value, and might not match your idea of what a ‘character’ is. Iteration over grapheme clusters may be what you actually want. This functionality is not provided by Rust’s standard library, check crates.io instead.
Yeah! Thanks, I had to learn the differece a while ago, since it was really confusing for me at first.
I do really mean a grapheme cluster in this case. I'm working on the bindings for a C library that stores the grapheme cluster only if it's up to 4 bytes, or a pointer to where it's stored if it's bigger than that, see: libnotcurses_sys::NcCell