Edit: Nevermind, I figured it out. I overlooked a single 'unconditional mapping' in SpecialCasing.txt that causes the behavior.
I'm trying to understand the behavior of the to_lowercase method for char.
It returns an iterator because lowercasing isn't always one-to-one in unicode, but the only char for which it returns more than one character is 0x130 (Capital I with dot above). All other chars return a single character when lowercased.
According to the documentation:
char requires special considerations (e.g. multiple
char s) the iterator yields the
char (s) given by
This operation performs an unconditional mapping without tailoring. That is, the conversion is independent of context and language.
But the behavior for 0x130 seems to match the behavior in UnicodeData.txt, not SpecialCasing.txt, despite there being a matching entry in the latter. Many other characters in UnicodeData.txt have similar one-to-multiple lower case rules as 0x130 and yet they yield only a single character when lowercased.
The unicode FAQ seems somewhat inconsistent here too:
Q: Is all of the Unicode case mapping information in UnicodeData.txt?
A: No. The UnicodeData.txt file includes all of the one-to-one case mappings. Since many parsers were built with the expectation that UnicodeData.txt would have at most a single character in each case mapping field, the file SpecialCasing.txt was added to provide the one-to-many mappings, such as the one needed for uppercasing ß (U+00DF LATIN SMALL LETTER SHARP S). In addition, CaseFolding.txt contains additional mappings used in case folding and caseless matching. For more information, see Section 5.18, Case Mappings in The Unicode Standard .
There are clearly one-to-many mappings in UnicodeData.txt and rust seems to use one of them, but no others. I can find no rhyme or reason for this behavior. For comparison, look at the mapping for 0x120 in UnicodeData.txt