The conversion copies the data, and includes an allocation on the heap.
Looking further into Buf::from_string (which OsString::from::<String> calls), it looks like there's a version for Windows and a version for everything else. The version for everything else doesn't have any clear allocations (it just stores the bytes of the string directly). The version for Windows delegates to Wtf8Buf::from_string, but that doesn't seem to allocate either.
So does OsString::from::<String> actually allocate? Is the documentation for PathBuf::from::<String> and PathBuf::from::<OsString> wrong? Did I follow the chain of delegations incorrectly somewhere?
Yes. An OsString is always a superset of UTF-8. On Windows the standard library will convert the OsString from/to a UTF-16 vector on every call to the Windows API.
If you call the Windows API yourself you have to manually do the conversion (or use a crate that does it for you).
To be specific on windows it is Vec<u8> like any other string, but encoded as WTF-8 (due to windows being quite nasty and stupid platform when it comes to Unicode)
That last part is in correct. Windows has a very strong convention of using Unicode and has done so for far longer than Linux has.
The difference is Linux applications now (mostly) use UTF-8 by default. Whereas the Windows kernel uses UTF-16. I wouldn't call UTF-16 "nasty and stupid" even if UTF-8 would be preferable nowadays.
Windows uses a modified form of UTF-16 which allows for unmatched surrogate pairs. This is why OsString uses the WTF-8 encoding on Windows instead of simply wrapping a UTF-8 string, there are strings that Windows considers valid which are invalid Unicode strings.
Linux does not use any form of Unicode, it just uses arbitrary byte sequences without internal 0 bytes, which a lot of userspace utilities then attempt to interpret as UTF-8.
No, if Rust's str was UTF-16 it still wouldn't be compatible with Windows APIs, there would have to be a separate OsString that allows unpaired surrogates still. Though there would be a zero-copy conversion from String to OsString since it would be a superset of UTF-16.
This is basically the same as what happens with str and OsStr on Linux today, str can contain a subset of what OsStr can, so there's a zero-copy conversion from str to OsStr but converting back is fallible.
I think I've not made myself clear. Rust's requirement that OsStr be zero-copy from a str is what makes WTF-8 necessary. If that wasn't true then an OsStr could live up to its name and be an more OS native string type.
As it is an OsStr, despite its name, is not an OS native string type.
That finally brings us back to the topic of this post. Do you have any evidence that the conversion from String to OsStringisn't zero-copy (on Windows)? Here's the source of Wtf8Buf::from_string (linked in the OP).
string.into_bytes() is zero-copy, and the function isn't doing anything with those bytes: it's just setting a field value. In case you aren't convinced, OsString is internally just a Buf