Should this perhaps be mentioned in some high-visibility location like the documentation of the &str primitive type? Or is this a well-known convention in the wild for languages that default to utf-8 due to the relative brevity of utf-32 in comparison to utf-8?
I wouldn't say that the \u escapes use UTF-32. They specify a Unicode character by its code point, a concept that is independent of any particular character encoding form. Then, all that needs to be said is that Rust String and str use UTF-8---and that is mentioned prominently in many places.
Ah I see, thanks for the clarification! I hadn't quite wrapped my mind around the distinction between a code point and its encoding as I don't deal with such things much.
I had just been surprised since according to Unicode Character 'SPARKLING HEART' (U+1F496), C/C++/Java apparently use the UTF-16 encoding for their escape sequences; so I expected to see the UTF-8 encoding in rust's escape sequences.
I think fileformat.info is wrong there, for C and C++. According to cppreference.com, the four or eight hex digits following a \u or \U escape in a C++ string are also a Unicode code point, just like Rust. Rust just uses {curly brackets} instead of a fixed number of digits. So the correct way to write "Sparkling Heart" in a C++ string is "\U0001f496", not the given "\uD83D\uDC96". This C++ playground seems to agree.
In Java and JavaScript, the story is different. Those languages explicitly use UTF-16 strings, so a character like Sparkling Heart needs to be written as a surrogate pair. That's what fileformat.info is showing, I think.