I'm writing a solution to advent of code 2016 day 7, and i need to create a function that takes a 4-letter sequence and compares the characters between each other.
Is there a concise way to bind the first n characters of a string to their variables? Something like let (a,b,c,d) = &s[..4] or let [a,b,c,d,_] = &s[..].
I remember reading something about pattern matching slices of the source string a few months back, but I can't find it
I also apologize if this has already been asked with a different phrasing i couldn't think of
There is no pattern matching of chars within a str, as a char is basically a u32 with invariants, while a str is UTF-8 (an encoding wherein each char-equivalent is encoded with a variable number of bytes). Ranged slices of a str are possible, but problematic in practice as they will panic on a non-char boundary. You can also pattern match against bytes with .as_bytes(), which is sufficient for some online drill type problems, but not necessarily a good practice (unless you're in an ASCII-constrained environment, perhaps).
Now I'm wondering: how common are actual production environments that are ASCII-constrained?
Also, I remember the rust manual mentioning libraries for isolating grapheme clusters. Can a function that assumes ASCII-only be safely converted into a grapheme clusters one? This is so I can know how urgently I should learn how to properly handle full unicode
(sorry for the very specific additional questions)
Anecdotal, but increasingly rare I feel. Most environments I personally work with are
More OsString like at the system level -- supersets of ASCII mainly
Sometimes programs/libraries assume ASCII in such environments, or conflate "bytes string" with ASCII, but they're incomplete/buggy and generally must be fixed if they start becoming more widely used
Unicode of some flavor
Variable encoding
It does come up from time to time in e.g. long-lived protocols, but it's nothing I would assume in the modern age.
There's no direct conversion as each ASCII byte is, well, a fixed size (a byte -- technically a 7-bit value, but almost always present as a byte) whereas a grapheme cluster may consist of multiple code points.
Encodings, the presentation width of strings, the definition of a character, et cetera are tricky problems on the universal level. The best approach is still a largely use-case specific concern, but generally speaking, I recommend developing a habit of considering encodings like UTF-8 (which is increasingly common) over assuming fixed-width or no-invaraints encodings like ASCII or byte strings.
In particular, Rust Strings/strs are UTF8, so at least be aware that you can't always logically split a str at an arbitrary byte offset (as you may be in the middle of a code point encoding). If you're dealing with splitting Strings at the presentation level, aspire to graphemes. At the parsing/searching/matching level, chars or even bytes are often sufficient.