The cost of windows api encoding?

Hi,
I'm reading a windows book, it says that windows system api are use unicode chars(maybe UTF-16), however, rust use utf-8 string.

As my understanding, utf-8 and utf-16 are having different memory layout. (e.g. 12你 have 5bytes(1+1+3) in utf8, but bytes (2 + 2+ 2 = 6) in utf-16).

Does it means there is always a cost for encoding convertion from rust api to windows system api? (e.g fs::File::open("12你")).

Please correct my if I'm misunderstanding

Yes, it will reencode from WTF-8 (UTF-8 superset capable of encoding surrogate pairs) to UTF-16.

2 Likes

Technically not UTF16 (or WTF8 would not be required), but UCS2 / UTF16-with-unpaired-surrogates.

1 Like

For some history: rfcs/0517-io-os-reform.md at master · rust-lang/rfcs · GitHub

It has always bothered me that Windows OsString is not wrapping [u16], but I guess that the Rust authors felt the cost of String / OsString conversion was more important than OsString to the OS. This is probably correct: OS calls are already expected to be expensive, while you might be stitching lots of strings together in Rust before that call.

2 Likes

It's worth noting that Windows does actually have support for using a UTF-8 system locale "codepage" (codepage 65001) in the A APIs (as opposed to the W APIs which take WTF-16). However, this requires the Windows user to set a global experimental flag that changes the behavior globally, and cannot really be changed on a per-executable basis, unless I've missed something.

However, the support is still listed as beta, and as such it's not really recommendable for widespread use. Using the wide APIs always works, no matter what the locale is.

1 Like

You're quite a bit out of date: Use UTF-8 code pages in Windows apps - Windows apps | Microsoft Docs

It's not that useful for performance, as Windows just does the conversion internally itself (like it does for all -A APIs)

3 Likes

This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.