Hello forum,
I'm trying to parse strings from my facebook data dump for fun. It encodes everything that's not latin characters in unicode (also for emojis, but I'm not interested in those):
\u00d1\u0087\u00d0\u00b0\u00d1\u0081 \u00d0\u00b8 \u00d0\u00b4\u00d0\u00b5\u00d0\u00b2\u00d0\u00b5\u00d1\u0082 \u00d0\u00bc\u00d0\u00b8\u00d0\u00bd\u00d1\u0083\u00d1\u0082\u00d0\u00b8
This tool gives me the desired output, which is this text in cyrillic:
час и девет минути
I'm sorry but I'm unfamiliar with text encodings and standards. As far as I understand this isn't valid UTF-8, so Rust's built-in String::from_utf8
result in jibberish like this ÐеÑÑÑ
. But I guess you can build char
s from "\u{00d0}"
?
Is there any way I can turn the above unicode into valid cyrllic utf8? Those are my only constraints.