I'm reading a windows book, it says that windows system api are use unicode chars(maybe UTF-16), however, rust use utf-8 string.
As my understanding, utf-8 and utf-16 are having different memory layout. (e.g.
12你 have 5bytes(1+1+3) in utf8, but bytes (2 + 2+ 2 = 6) in utf-16).
Does it means there is always a cost for encoding convertion from rust api to windows system api? (e.g fs::File::open("12你")).
Please correct my if I'm misunderstanding
Yes, it will reencode from WTF-8 (UTF-8 superset capable of encoding surrogate pairs) to UTF-16.
Technically not UTF16 (or WTF8 would not be required), but UCS2 / UTF16-with-unpaired-surrogates.
For some history: rfcs/0517-io-os-reform.md at master · rust-lang/rfcs · GitHub
It has always bothered me that
Windows OsString is not wrapping
[u16], but I guess that the Rust authors felt the cost of
OsString conversion was more important than
OsString to the OS. This is probably correct: OS calls are already expected to be expensive, while you might be stitching lots of strings together in Rust before that call.
It's worth noting that Windows does actually have support for using a UTF-8 system locale "codepage" (codepage 65001) in the
A APIs (as opposed to the
W APIs which take WTF-16). However, this requires the Windows user to set a global experimental flag that changes the behavior globally, and cannot really be changed on a per-executable basis, unless I've missed something.
However, the support is still listed as beta, and as such it's not really recommendable for widespread use. Using the wide APIs always works, no matter what the locale is.
You're quite a bit out of date: Use UTF-8 code pages in Windows apps - Windows apps | Microsoft Docs
It's not that useful for performance, as Windows just does the conversion internally itself (like it does for all -A APIs)
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.