I think it's trivial with some regex and UTF-8 operations, but is there a crate similiar to Ecma encodeURI(...)
and decodeURI(...)
for encoding general Unicode code points into %xx
sequences and back into Unicode code points?
You could use the types in the http
crate. I believe they take care of escaping?
urlencoding is quite popular
I tried:
use std::str::FromStr;
use http::uri::Uri;
pub fn encode_uri(s: impl AsRef<str>) -> String {
Uri::from_str(s.as_ref()).expect("URI malformed").to_string()
}
When I give it something like app-storage://
it fails... If the domain/path is filled, it works. Hmm... doesn't work for my case.
Sadly it escapes everything. I gave it app-storage://ã
and it escaped even the ://
characters:
app-storage%3A%2F%2F%C3%A3
I think you want the url
crate
use url::Url;
fn main() {
let s = "app-storage://ã";
println!("{}", Url::parse(s).unwrap());
}
Output
app-storage://%C3%A3
It also works with parsing just "app-storage://"
without a host.
Is there a way to decode the host and path? I know of urlencoding, but it encodes the slashes...
Right, I can probably use a regex to split the sum of host-path by slashes and map to urlencoding::decode
What do you mean by decode? Url
parses the URL and handles the encoding of the path
If you want the raw, unencoded url you can use urlencoding::decode
on the encoded url string. It will ignore everything that is not %XX
encoded. So you don't need to split the url and decode the different parts one by one:
println!("{}", urlencoding::decode("app-storage://%C3%A3").to_owned().unwrap());
Stdout:
app-storage://ã
Oops, I've mixed things a bit here. I meant decode as a way to translate a URI into a native path based on host operating system.
Out of the decoding matter, I've only one issue with the url
crate... It requires a "host". In my case I need only path. I'm using URIs like file:
, app:
... which may begin with a directory name like Física
. I tried parsing a URI with that host and it's said to be malformed.
What about using the uriparse
crate? Unlike the url
crate and http
, this should work for arbitrary URIs and not just URLs.
I tried:
println!("{}", uriparse::URI::try_from("app://Física").unwrap().host().unwrap());
Got InvalidIPv4OrRegisteredNameCharacter
error.
I think the uriparse crate only supports URI's and not IRI's (Internationalized Resource Identifier). URI's are limited to ASCII, while IRI's allow full Unicode.
iri-string crate (disclaimer: I am the author) can be used to encode unicode string into URI-encoded string, and you can control in what context the string should be encoded (i.e. to encode #
or not, to encode /
or not, etc.)
However, It cannot naively decode the encoded URI into an IRI without applying some normalization.
I resolved to implement percent encoding and decoding by myself using just lazy_regex
:
use lazy_regex::{regex_replace_all};
pub fn encode_uri(s: impl AsRef<str>) -> String {
regex_replace_all!(r"[^A-Za-z0-9_\-\.:/\\]", s.as_ref(), |seq: &str| {
let mut r = String::new();
for ch in seq.to_owned().bytes() {
r.push('%');
r.push_str(octet_to_hex(ch).as_ref());
}
r.clone()
}).into_owned()
}
pub fn decode_uri(s: impl AsRef<str>) -> String {
regex_replace_all!(r"(%[A-Fa-f0-9]{2})+", s.as_ref(), |seq: &str, _| {
let mut r = Vec::<u8>::new();
let inp: Vec<u8> = seq.to_owned().bytes().collect();
let mut i: usize = 0;
while i != inp.len() {
r.push(u8::from_str_radix(String::from_utf8_lossy(&[inp[i + 1], inp[i + 2]]).as_ref(), 16).unwrap_or(0));
i += 3;
}
String::from_utf8_lossy(r.as_ref()).into_owned().to_owned()
}).into_owned()
}
fn octet_to_hex(arg: u8) -> String {
let r = format!("{:x}", arg);
((if r.len() == 1 { "0" } else { "" }).to_owned() + &r).to_uppercase().to_owned()
}
For now I decided to ignore the colon.
octet_to_hex()
can be simplified by {:02X}
formatting.
fn main() {
assert_eq!(format!("{:02X}", 1), "01");
assert_eq!(format!("{:02X}", 254), "FE");
}
This topic was automatically closed 90 days after the last reply. We invite you to open a new topic if you have further questions or comments.