I am writing a application, that needs to store multiple Paths (and other things, which are not relevant) inside a sqlite database. The application will be cross platform, that is why i am trying to save the path as a string with the os-encoding, without having to parse it to another encoding and losing or corrupting data. I am using sqlx and all Path's are rust Pathbuf's.
What i tried:
At first i bluntly converted the path to a string and ignored all errors. But this is just a stupid idea.
After that i tried to save the OsString directly inside sql. This did not work, because sqlx could not save an OsString, and i could not implement the necessary traits.
My last idea is to dump the raw OsString bytes, and then read those bytes into an OsString every time i need the path. But i dont really know how to start. As it is not really possible to get the raw bytes of an OsString because of implementation reasons, I googled and looked at some libraries (OsStr Bytes and bstr), but i don't know if those are the right choice, as i lack in experience with OsStrings.
How can i safely store those Path's inside my sqlite? Is the last idea the right direction?
For unix, Rust allows you to get raw bytes, and you can store them as a BLOB.
For Windows, you will need to store paths as broken UTF-16/UCS-2, just like Windows does it. Or you can choose to break Windows paths with invalid surrogates, and convert them to UTF-8, which allows you to reuse the byte-oriented BLOB approach used for Unix.
On windows, you can convert the path to a Vec<u16> and on other platforms you can convert it to an Vec<u8>. Storing the resulting vector will do it. I don't think there's any standard lossless way to convert the Vec<u16> from windows into an Vec<u8>, but you could come up with your own.
Those both ideas sound great. If i would convert the windows path to utf-8, would i always have to reconvert the path back, to use it? And do i need to configure something in the sql database or can i directly save both Vec<u16> and Vec<u8> via sqlx?
I think i now have implemented it the correct way. It works, but can one of you guys confirm, if this is the right way? I changed the database table to accept a blob. Now every time I save or read my data to/from the database, i convert it using those methods.
Is this the right way? Or should i remove the windows "byte mapping" and just create multiple methods for saving the bytes? One for windows which saves/reads a Vec<u16>, and one for Linux which saves/reads a Vec<u8>?
Do you maybe also have tips on how to write some tests for these methods, to check if it actually converts correctly?
Your back-conversion uses from_ne_bytes() which will break on big-endian platforms. Also, for converting back from a blob on Unix, the code wouldn't compile because from_vec() takes a vector, not a slice (it takes ownership so as to avoid unnecessary clones).